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Preface 


R has a command line interface that offers considerable advantages over menu 
systems in terms of efficiency and speed once the commands are known and the 
language understood. However, the command line system can be daunting for 
the first-time user, so there is a need for concise texts to enable the student or 
analyst to make progress with R in their area of study. This book aims to fulfil 
that need in the area of time series to enable the non-specialist to progress, 
at a fairly quick pace, to a level where they can confidently apply a range of 
time series methods to a variety of data sets. The book assumes the reader 
has a knowledge typical of a first-year university statistics course and is based 
around lecture notes from a range of time series courses that we have taught 
over the last twenty years. Some of this material has been delivered to post- 
graduate finance students during a concentrated six-week course and was well 
received, so a selection of the material could be mastered in a concentrated 
course, although in general it would be more suited to being spread over a 
complete semester. 

The book is based around practical applications and generally follows a 
similar format for each time series model being studied. First, there is an 
introductory motivational section that describes practical reasons why the 
model may be needed. Second, the model is described and defined in math- 
ematical notation. The model is then used to simulate synthetic data using 
R code that closely reflects the model definition and then fitted to the syn- 
thetic data to recover the underlying model parameters. Finally, the model 
is fitted to an example historical data set and appropriate diagnostic plots 
given. By using R, the whole procedure can be reproduced by the reader, 
and it is recommended that students work through most of the examples.! 
Mathematical derivations are provided in separate frames and starred sec- 


! We used the R package Sweave to ensure that, in general, your code will produce 
the same output as ours. However, for stylistic reasons we sometimes edited our 
code; e.g., for the plots there will sometimes be minor differences between those 
generated by the code in the text and those shown in the actual figures. 


vii 


viii Preface 


tions and can be omitted by those wanting to progress quickly to practical 
applications. At the end of each chapter, a concise summary of the R com- 
mands that were used is given followed by exercises. All data sets used in 
the book, and solutions to the odd numbered exercises, are available on the 
website http://www.massey.ac.nz/~pscowper /ts. 

We thank John Kimmel of Springer and the anonymous referees for their 
helpful guidance and suggestions, Brian Webby for careful reading of the text 
and valuable comments, and John Xie for useful comments on an earlier draft. 
The Institute of Information and Mathematical Sciences at Massey Univer- 
sity and the School of Mathematical Sciences, University of Adelaide, are 
acknowledged for support and funding that made our collaboration possible. 
Paul thanks his wife, Sarah, for her continual encouragement and support 
during the writing of this book, and our son, Daniel, and daughters, Lydia 
and Louise, for the joy they bring to our lives. Andrew thanks Natalie for 
providing inspiration and her enthusiasm for the project. 


Paul Cowpertwait and Andrew Metcalfe 


Massey University, Auckland, New Zealand 
University of Adelaide, Australia 


December 2008 
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Time Series Data 


1.1 Purpose 


Time series are analysed to understand the past and to predict the future, 
enabling managers or policy makers to make properly informed decisions. 
A time series analysis quantifies the main features in data and the random 
variation. These reasons, combined with improved computing power, have 
made time series methods widely applicable in government, industry, and 
commerce. 

The Kyoto Protocol is an amendment to the United Nations Framework 
Convention on Climate Change. It opened for signature in December 1997 and 
came into force on February 16, 2005. The arguments for reducing greenhouse 
gas emissions rely on a combination of science, economics, and time series 
analysis. Decisions made in the next few years will affect the future of the 
planet. 

During 2006, Singapore Airlines placed an initial order for twenty Boeing 
787-9s and signed an order of intent to buy twenty-nine new Airbus planes, 
twenty A350s, and nine A380s (superjumbos). The airline’s decision to expand 
its fleet relied on a combination of time series analysis of airline passenger 
trends and corporate plans for maintaining or increasing its market share. 

Time series methods are used in everyday operational decisions. For exam- 
ple, gas suppliers in the United Kingdom have to place orders for gas from the 
offshore fields one day ahead of the supply. Variation about the average for 
the time of year depends on temperature and, to some extent, the wind speed. 
Time series analysis is used to forecast demand from the seasonal average with 
adjustments based on one-day-ahead weather forecasts. 

Time series models often form the basis of computer simulations. Some 
examples are assessing different strategies for control of inventory using a 
simulated time series of demand; comparing designs of wave power devices us- 
ing a simulated series of sea states; and simulating daily rainfall to investigate 
the long-term environmental effects of proposed water management policies. 


P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 1 
Use R, DOI 10.1007 /978-0-387-88698-5_1, 
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2 1 Time Series Data 


1.2 Time series 


In most branches of science, engineering, and commerce, there are variables 
measured sequentially in time. Reserve banks record interest rates and ex- 
change rates each day. The government statistics department will compute 
the country’s gross domestic product on a yearly basis. Newspapers publish 
yesterday’s noon temperatures for capital cities from around the world. Me- 
teorological offices record rainfall at many different sites with differing reso- 
lutions. When a variable is measured sequentially in time over or at a fixed 
interval, known as the sampling interval, the resulting data form a time series. 

Observations that have been collected over fixed sampling intervals form a 
historical time series. In this book, we take a statistical approach in which the 
historical series are treated as realisations of sequences of random variables. A 
sequence of random variables defined at fixed sampling intervals is sometimes 
referred to as a discrete-time stochastic process, though the shorter name 
time series model is often preferred. The theory of stochastic processes is vast 
and may be studied without necessarily fitting any models to data. However, 
our focus will be more applied and directed towards model fitting and data 
analysis, for which we will be using R.! 

The main features of many time series are trends and seasonal varia- 
tions that can be modelled deterministically with mathematical functions of 
time. But, another important feature of most time series is that observations 
close together in time tend to be correlated (serially dependent). Much of the 
methodology in a time series analysis is aimed at explaining this correlation 
and the main features in the data using appropriate statistical models and 
descriptive methods. Once a good model is found and fitted to data, the an- 
alyst can use the model to forecast future values, or generate simulations, to 
guide planning decisions. Fitted models are also used as a basis for statistical 
tests. For example, we can determine whether fluctuations in monthly sales 
figures provide evidence of some underlying change in sales that we must now 
allow for. Finally, a fitted statistical model provides a concise summary of the 
main characteristics of a time series, which can often be essential for decision 
makers such as managers or politicians. 

Sampling intervals differ in their relation to the data. The data may have 
been aggregated (for example, the number of foreign tourists arriving per day) 
or sampled (as in a daily time series of close of business share prices). If data 
are sampled, the sampling interval must be short enough for the time series 
to provide a very close approximation to the original continuous signal when 
it is interpolated. In a volatile share market, close of business prices may not 
suffice for interactive trading but will usually be adequate to show a com- 
pany’s financial performance over several years. At a quite different timescale, 


! R was initiated by Ihaka and Gentleman (1996) and is an open source implemen- 
tation of S, a language for data analysis developed at Bell Laboratories (Becker 
et al. 1988). 
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time series analysis is the basis for signal processing in telecommunications, 
engineering, and science. Continuous electrical signals are sampled to provide 
time series using analog-to-digital (A/D) converters at rates that can be faster 
than millions of observations per second. 


1.3 R language 


It is assumed that you have R (version 2 or higher) installed on your computer, 
and it is suggested that you work through the examples, making sure your 
output agrees with ours.” If you do not have R, then it can be installed free 
of charge from the Internet site www.r-project.org. It is also recommended 
that you have some familiarity with the basics of R, which can be obtained 
by working through the first few chapters of an elementary textbook on R 
(e.g., Dalgaard 2002) or using the online “An Introduction to R”, which is 
also available via the R help system — type help.start() at the command 
prompt to access this. 

R has many features in common with both functional and object oriented 
programming languages. In particular, functions in R are treated as objects 
that can be manipulated or used recursively.? For example, the factorial func- 
tion can be written recursively as 


> Fact <- function(n) if (n == 1) 1 else n * Fact(n - 1) 
» Fact(5) 


[1] 120 


In common with functional languages, assignments in R can be avoided, 
but they are useful for clarity and convenience and hence will be used in 
the examples that follow. In addition, R runs faster when ‘loops’ are avoided, 
which can often be achieved using matrix calculations instead. However, this 
can sometimes result in rather obscure-looking code. Thus, for the sake of 
transparency, loops will be used in many of our examples. Note that R is case 
sensitive, so that X and x, for example, correspond to different variables. In 
general, we shall use uppercase for the first letter when defining new variables, 
as this reduces the chance of overwriting inbuilt R functions, which are usually 
in lowercase.^ 


? Some of the output given in this book may differ slightly from yours. This is most 
likely due to editorial changes made for stylistic reasons. For conciseness, we also 
used options(digits-3) to set the number of digits to 4 in the computer output 
that appears in the book. 

Do not be concerned if you are unfamiliar with some of these computing terms, 
as they are not really essential in understanding the material in this book. The 
main reason for mentioning them now is to emphasise that R can almost certainly 
meet your future statistical and programming needs should you wish to take the 
study of time series further. 

^ For example, matrix transpose is t , so t should not be used for time. 
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The best way to learn to do a time series analysis in R is through practice, 
so we now turn to some examples, which we invite you to work through. 


1.4 Plots, trends, and seasonal variation 


1.4.1 A flying start: Air passenger bookings 


The number of international passenger bookings (in thousands) per month 
on an airline (Pan Am) in the United States were obtained from the Federal 
Aviation Administration for the period 1949-1960 (Brown, 1963). The com- 
pany used the data to predict future demand before ordering new aircraft and 
training aircrew. The data are available as a time series in R and illustrate 
several important concepts that arise in an exploratory time series analysis. 

Type the following commands in R, and check your results against the 
output shown here. To save on typing, the data are assigned to a variable 
called AP. 


> data(AirPassengers) 
> AP <- AirPassengers 
> AP 


Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
1949 112 118 132 129 121 135 148 148 136 119 104 118 
1950 115 126 141 135 125 149 170 170 158 133 114 140 
1951 145 150 178 163 172 178 199 199 184 162 146 166 
1952 171 180 193 181 183 218 230 242 209 191 172 194 
1953 196 196 236 235 229 243 264 272 237 211 180 201 
1954 204 188 235 227 234 264 302 293 259 229 203 229 
1955 242 233 267 269 270 315 364 347 312 274 237 278 
1956 284 277 317 313 318 374 413 405 355 306 271 306 
1957 315 301 356 348 355 422 465 467 404 347 305 336 
1958 340 318 362 348 363 435 491 505 404 359 310 337 
1959 360 342 406 396 420 472 548 559 463 407 362 405 
1960 417 391 419 461 472 535 622 606 508 461 390 432 


All data in R are stored in objects, which have a range of methods available. 
The class of an object can be found using the class function: 


> class (AP) 
[1] "ts" 
> start(AP); end(AP); frequency (AP) 


[1] 1949 1 
[1] 1960 12 
[1] 12 
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In this case, the object is of class ts, which is an abbreviation for ‘time 
series’. Time series objects have a number of methods available, which include 
the functions start, end, and frequency given above. These methods can be 
listed using the function methods, but the output from this function is not 
always helpful. The key thing to bear in mind is that generic functions in R, 
such as plot or summary, will attempt to give the most appropriate output 
to any given input object; try typing summary (AP) now to see what happens. 

As the objective in this book is to analyse time series, it makes sense to 
put our data into objects of class ts. This can be achieved using a function 
also called ts, but this was not necessary for the airline data, which were 
already stored in this form. In the next example, we shall create a ts object 
from data read directly from the Internet. 

One of the most important steps in a preliminary time series analysis is to 
plot the data; i.e., create a time plot. For a time series object, this is achieved 
with the generic plot function: 


> plot(AP, ylab = "Passengers (1000's)") 


You should obtain a plot similar to Figure 1.1 below. Parameters, such as 
xlab or ylab, can be used in plot to improve the default labels. 
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Fig. 1.1. International air passenger bookings in the United States for the period 
1949-1960. 


There are a number of features in the time plot of the air passenger data 
that are common to many time series (Fig. 1.1). For example, it is apparent 
that the number of passengers travelling on the airline is increasing with time. 
In general, a systematic change in a time series that does not appear to be 
periodic is known as a trend. The simplest model for a trend is a linear increase 
or decrease, and this is often an adequate approximation. 
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A repeating pattern within each year is known as seasonal variation, al- 
though the term is applied more generally to repeating patterns within any 
fixed period, such as restaurant bookings on different days of the week. There 
is clear seasonal variation in the air passenger time series. At the time, book- 
ings were highest during the summer months of June, July, and August and 
lowest during the autumn month of November and winter month of February. 
Sometimes we may claim there are cycles in a time series that do not corre- 
spond to some fixed natural period; examples may include business cycles or 
climatic oscillations such as El Nino. None of these is apparent in the airline 
bookings time series. 

An understanding of the likely causes of the features in the plot helps us 
formulate an appropriate time series model. In this case, possible causes of 
the increasing trend include rising prosperity in the aftermath of the Second 
World War, greater availability of aircraft, cheaper flights due to competition 
between airlines, and an increasing population. The seasonal variation coin- 
cides with vacation periods. In Chapter 5, time series regression models will 
be specified to allow for underlying causes like these. However, many time 
series exhibit trends, which might, for example, be part of a longer cycle or be 
random and subject to unpredictable change. Random, or stochastic, trends 
are common in economic and financial time series. A regression model would 
not be appropriate for a stochastic trend. 

Forecasting relies on extrapolation, and forecasts are generally based on 
an assumption that present trends continue. We cannot check this assumption 
in any empirical way, but if we can identify likely causes for a trend, we can 
justify extrapolating it, for a few time steps at least. An additional argument 
is that, in the absence of some shock to the system, a trend is likely to change 
relatively slowly, and therefore linear extrapolation will provide a reasonable 
approximation for a few time steps ahead. Higher-order polynomials may give 
a good fit to the historic time series, but they should not be used for extrap- 
olation. It is better to use linear extrapolation from the more recent values 
in the time series. Forecasts based on extrapolation beyond a year are per- 
haps better described as scenarios. Expecting trends to continue linearly for 
many years will often be unrealistic, and some more plausible trend curves 
are described in Chapters 3 and 5. 

A time series plot not only emphasises patterns and features of the data 
but can also expose outliers and erroneous values. One cause of the latter is 
that missing data are sometimes coded using a negative value. Such values 
need to be handled differently in the analysis and must not be included as 
observations when fitting a model to data.? Outlying values that cannot be 
attributed to some coding should be checked carefully. If they are correct, 


? Generally speaking, missing values are suitably handled by R, provided they are 
correctly coded as ‘NA’. However, if your data do contain missing values, then it 
is always worth checking the ‘help’ on the R function that you are using, as an 
extra parameter or piece of coding may be required. 
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they are likely to be of particular interest and should not be excluded from 
the analysis. However, it may be appropriate to consider robust methods of 
fitting models, which reduce the influence of outliers. 

To get a clearer view of the trend, the seasonal effect can be removed by 
aggregating the data to the annual level, which can be achieved in R using the 
aggregate function. A summary of the values for each season can be viewed 
using a boxplot, with the cycle function being used to extract the seasons 
for each item of data. 

The plots can be put in a single graphics window using the layout func- 
tion, which takes as input a vector (or matrix) for the location of each plot 
in the display window. The resulting boxplot and annual series are shown in 
Figure 1.2. 


> layout (1:2) 
> plot (aggregate (AP) ) 
> boxplot(AP ^ cycle(AP)) 


You can see an increasing trend in the annual series (Fig. 1.2a) and the sea- 
sonal effects in the boxplot. More people travelled during the summer months 
of June to September (Fig. 1.2b). 


1.4.2 Unemployment: Maine 


Unemployment rates are one of the main economic indicators used by politi- 
cians and other decision makers. For example, they influence policies for re- 
gional development and welfare provision. The monthly unemployment rate 
for the US state of Maine from January 1996 until August 2006 is plotted 
in the upper frame of Figure 1.3. In any time series analysis, it is essential 
to understand how the data have been collected and their unit of measure- 
ment. The US Department of Labor gives precise definitions of terms used to 
calculate the unemployment rate. 

The monthly unemployment data are available in a file online that is read 
into R in the code below. Note that the first row in the file contains the name 
of the variable (unemploy), which can be accessed directly once the attach 
command is given. Also, the header parameter must be set to TRUE so that R 
treats the first row as the variable name rather than data. 


> www <- "http://www.massey.ac.nz/^pscowper/ts/Maine.dat" 
> Maine.month <- read.table(www, header = TRUE) 


> attach (Maine.month) 
> class (Maine.month) 


[1] "data.frame" 


When we read data in this way from an ASCII text file, the ‘class’ is not 
time series but data.frame. The ts function is used to convert the data to a 
time series object. The following command creates a time series object: 
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Fig. 1.2. International air passenger bookings in the United States for the period 
1949-1960. Units on the y-axis are 1000s of people. (a) Series aggregated to the 
annual level; (b) seasonal boxplots of the data. 


> Maine.month.ts <- ts(unemploy, start = c(1996, 1), freq = 12) 


This uses all the data. You can select a smaller number by specifying an 
earlier end date using the parameter end. If we wish to analyse trends in the 
unemployment rate, annual data will suffice. The average (mean) over the 
twelve months of each year is another example of aggregated data, but this 
time we divide by 12 to give a mean annual rate. 


> Maine.annual.ts <- aggregate(Maine.month.ts)/12 


We now plot both time series. There is clear monthly variation. From 
Figure 1.3(a) it seems that the February figure is typically about 2096 more 
than the annual average, whereas the August figure tends to be roughly 20% 
less. 

> layout(1:2) 
> plot(Maine.month.ts, ylab = "unemployed (%)") 
> plot(Maine.annual.ts, ylab = "unemployed (%)") 


We can calculate the precise percentages in R, using window. This function 
will extract that part of the time series between specified start and end points 
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and will sample with an interval equal to frequency if its argument is set to 
TRUE. So, the first line below gives a time series of February figures. 


TRUE) 
TRUE) 


Maine.Feb <- window(Maine.month.ts, start = c(1996,2), freq 
Maine.Aug <- window(Maine.month.ts, start = c(1996,8), freq 
Feb.ratio <- mean(Maine.Feb) / mean(Maine.month.ts) 
Aug.ratio <- mean(Maine.Aug) / mean(Maine.month.ts) 


> 
> 
> 
> 


> Feb.ratio 
[1] 1.223 

> Aug.ratio 
[1] 0.8164 


On average, unemployment is 22% higher in February and 18% lower in 
August. An explanation is that Maine attracts tourists during the summer, 
and this creates more jobs. Also, the period before Christmas and over the 
New Year’s holiday tends to have higher employment rates than the first few 
months of the new year. The annual unemployment rate was as high as 8.5% 
in 1976 but was less than 4% in 1988 and again during the three years 1999— 
2001. If we had sampled the data in August of each year, for example, rather 
than taken yearly averages, we would have consistently underestimated the 
unemployment rate by a factor of about 0.8. 
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Fig. 1.3. Unemployment in Maine: (a) monthly January 1996-August 2006; (b) 
annual 1996-2005. 
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Fig. 1.4. Unemployment in the United States January 1996-October 2006. 


'The monthly unemployment rate for all of the United States from January 
1996 until October 2006 is plotted in Figure 1.4. The decrease in the unem- 
ployment rate around the millennium is common to Maine and the United 
States as a whole, but Maine does not seem to be sharing the current US 
decrease in unemployment. 


www <- "http://www.massey.ac.nz/~pscowper/ts/USunemp.dat" 

US.month «- read.table(www, header = T) 

attach (US.month) 

US.month.ts <- ts(USun, start=c(1996,1), end=c(2006,10), freq = 12) 
plot(US.month.ts, ylab = "unemployed (%)") 


V VM M M 


1.4.3 Multiple time series: Electricity, beer and chocolate data 


Here we illustrate a few important ideas and concepts related to multiple time 
series data. The monthly supply of electricity (millions of kWh), beer (MI), 
and chocolate-based production (tonnes) in Australia over the period January 
1958 to December 1990 are available from the Australian Bureau of Statistics 
(ABS). The three series have been stored in a single file online, which can be 
read as follows: 


www <- "http://www.massey.ac.nz/^pscowper/ts/cbe.dat" 
CBE <- read.table(www, header = T) 


> CBE[1:4, ] 


choc beer elec 
1451 96.3 1497 
2037 84.4 1463 
2477 91.2 1648 
2785 81.9 1595 


PWN 


$ ABS data used with permission from the Australian Bureau of Statistics: 
http://www.abs.gov.au. 
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> class (CBE) 
[1] "data.frame" 


Now create time series objects for the electricity, beer, and chocolate data. 
If you omit end, R uses the full length of the vector, and if you omit the month 
in start, R assumes 1. You can use plot with cbind to plot several series on 
one figure (Fig. 1.5). 


> Elec.ts <- ts(CBE[, 3], start = 1958, freq = 12) 
> Beer.ts <- ts(CBE[, 2], start = 1958, freq = 12) 
> Choc.ts <- ts(CBE[, 1], start = 1958, freq = 12) 


> plot(cbind(Elec.ts, Beer.ts, Choc.ts)) 


Chocolate, Beer, and Electricity Production: 1958-1990 
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Fig. 1.5. Australian chocolate, beer, and electricity production; January 1958- 
December 1990. 


'The plots in Figure 1.5 show increasing trends in production for all three 
goods, partly due to the rising population in Australia from about 10 million 
to about 18 million over the same period (Fig. 1.6). But notice that electricity 
production has risen by a factor of 7, and chocolate production by a factor of 
4, over this period during which the population has not quite doubled. 

The three series constitute a multiple time series. There are many functions 
in R for handling more than one series, including ts. intersect to obtain the 
intersection of two series that overlap in time. We now illustrate the use of the 
intersect function and point out some potential pitfalls in analysing multiple 
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Fig. 1.6. Australia's population, 1900-2000. 


time series. The intersection between the air passenger data and the electricity 
data is obtained as follows: 


> AP.elec <- ts.intersect(AP, Elec.ts) 


Now check that your output agrees with ours, as shown below. 


> start (AP.elec) 
[1] 1958 1 
> end(AP.elec) 
[1] 1960 12 
> AP.elec[1:3, ] 


AP Elec.ts 
[1,] 340 1497 
[2,] 318 1463 
[3,] 362 1648 


In the code below, the data for each series are extracted and plotted 
(Fig. 1.7)." 


» AP «- AP.elec[,1]; Elec «- AP.elec[,2] 
> layout(1:2) 


> plot(AP, main = "", ylab = "Air passengers / 1000's") 
> plot(Elec, main = "", ylab = "Electricity production / MkWh") 


> plot(as.vector(AP), as.vector(Elec), 

xlab = "Air passengers / 1000's", 

ylab = "Electricity production / MWh") 
> abline(reg = 1m(Elec ^ AP)) 


T R is case sensitive, so lowercase is used here to represent the shorter record of air 
passenger data. In the code, we have also used the argument main="" to suppress 
unwanted titles. 
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> cor(AP, Elec) 
[1] 0.884 


In the plot function above, as. vector is needed to convert the ts objects to 
ordinary vectors suitable for a scatter plot. 
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Fig. 1.7. International air passengers and Australian electricity production for the 
period 1958-1960. The plots look similar because both series have an increasing 
trend and a seasonal cycle. However, this does not imply that there exists a causal 
relationship between the variables. 


The two time series are highly correlated, as can be seen in the plots, with a 
correlation coefficient of 0.88. Correlation will be discussed more in Chapter 2, 
but for the moment observe that the two time plots look similar (Fig. 1.7) and 
that the scatter plot shows an approximate linear association between the two 
variables (Fig. 1.8). However, it is important to realise that correlation does 
not imply causation. In this case, it is not plausible that higher numbers of 
air passengers in the United States cause, or are caused by, higher electricity 
production in Australia. A reasonable explanation for the correlation is that 
the increasing prosperity and technological development in both countries over 
this period accounts for the increasing trends. The two time series also happen 
to have similar seasonal variations. For these reasons, it is usually appropriate 
to remove trends and seasonal effects before comparing multiple series. This 
is often achieved by working with the residuals of a regression model that has 
deterministic terms to represent the trend and seasonal effects (Chapter 5). 
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In the simplest cases, the residuals can be modelled as independent random 
variation from a single distribution, but much of the book is concerned with 
fitting more sophisticated models. 
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Fig. 1.8. Scatter plot of air passengers and Australian electricity production for 
the period: 1958-1960. The apparent linear relationship between the two variables 
is misleading and a consequence of the trends in the series. 


1.4.4 Quarterly exchange rate: GBP to NZ dollar 


The trends and seasonal patterns in the previous two examples were clear 
from the plots. In addition, reasonable explanations could be put forward for 
the possible causes of these features. With financial data, exchange rates for 
example, such marked patterns are less likely to be seen, and different methods 
of analysis are usually required. A financial series may sometimes show a 
dramatic change that has a clear cause, such as a war or natural disaster. Day- 
to-day changes are more difficult to explain because the underlying causes are 
complex and impossible to isolate, and it will often be unrealistic to assume 
any deterministic component in the time series model. 

The exchange rates for British pounds sterling to New Zealand dollars 
for the period January 1991 to March 2000 are shown in Figure 1.9. The 
data are mean values taken over quarterly periods of three months, with the 
first quarter being January to March and the last quarter being October to 
December. They can be read into R from the book website and converted to 
a quarterly time series as follows: 


> www <- "http://www.massey.ac.nz/^pscowper/ts/pounds, nz.dat" 
> Z <- read.table(www, header = T) 


> Z[1:4, ] 
[1] 2.92 2.94 3.17 3.25 


> Z.ts <- ts(Z, st = 1991, fr = 4) 
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> plot(Z.ts, xlab = "time / years", 


ylab = "Quarterly exchange rate in $NZ / pound") 


Short-term trends are apparent in the time series: After an initial surge 
ending in 1992, a negative trend leads to a minimum around 1996, which is 
followed by a positive trend in the second half of the series (Fig. 1.9). 

The trend seems to change direction at unpredictable times rather than 
displaying the relatively consistent pattern of the air passenger series and 
Australian production series. Such trends have been termed stochastic trends 
to emphasise this randomness and to distinguish them from more deterministic 
trends like those seen in the previous examples. A mathematical model known 
as a random walk can sometimes provide a good fit to data like these and is 
fitted to this series in §4.4.2. Stochastic trends are common in financial series 
and will be studied in more detail in Chapters 4 and 7. 
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Fig. 1.9. Quarterly exchange rates for the period 1991—2000. 


'Two local trends are emphasised when the series is partitioned into two 
subseries based on the periods 1992-1996 and 1996-1998. The window function 
can be used to extract the subseries: 


> 2.92.96 «- window(Z.ts, start = c(1992, 1), end = c(1996, 1)) 
> 2.96.98 «- window(Z.ts, start = c(1996, 1), end = c(1998, 1)) 


v 


layout (1:2) 

plot(Z.92.96, ylab = "Exchange rate in $NZ/pound", 
xlab = "Time (years)" ) 

plot(Z.96.98, ylab = "Exchange rate in $NZ/pound", 
xlab - "Time (years)" ) 


v 


v 


Now suppose we were observing this series at the start of 1992; i.e., we 
had the data in Figure 1.10(a). It might have been tempting to predict a 


16 1 Time Series Data 


Kel 
£ 
a + | 
—- oy 
N 
Z 
e ed 
£ 
oO 4 
S o 
o QW | 
D 
e _ 
S8 a | 
9 w T T T T T 
z 1992 1993 1994 1995 1996 
Time (years) 
(a) Exchange rates for 1992-1996 

o 
[an 
- 
o 
a z 
> 

o _ 
z a 
[o] 4 
g 
o x | 
Pow 
oO 
M ud 
E T T T T 
üi 


1996.0 1996.5 1997.0 1997.5 1998.0 


Time (years) 
(b) Exchange rates for 1996-1998 


Fig. 1.10. Quarterly exchange rates for two periods. The plots indicate that without 
additional information it would be inappropriate to extrapolate the trends. 


continuation of the downward trend for future years. However, this would have 
been a very poor prediction, as Figure 1.10(b) shows that the data started to 
follow an increasing trend. Likewise, without additional information, it would 
also be inadvisable to extrapolate the trend in Figure 1.10(b). This illustrates 
the potential pitfall of inappropriate extrapolation of stochastic trends when 
underlying causes are not properly understood. To reduce the risk of making 
an inappropriate forecast, statistical tests, introduced in Chapter 7, can be 
used to test for a stochastic trend. 


1.4.5 Global temperature series 


A change in the world’s climate will have a major impact on the lives of 
many people, as global warming is likely to lead to an increase in ocean levels 
and natural hazards such as floods and droughts. It is likely that the world 
economy will be severely affected as governments from around the globe try 
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to enforce a reduction in fossil fuel use and measures are taken to deal with 
any increase in natural disasters.® 

In climate change studies (e.g., see Jones and Moberg, 2003; Rayner et al. 
2003), the following global temperature series, expressed as anomalies from 
the monthly means over the period 1961-1990, plays a central role:? 


> www <- "http://www.massey.ac.nz/~pscowper/ts/global.dat" 

> Global <- scan(www) 

> Global.ts <- ts(Global, st = c(1856, 1), end = c(2005, 12), 
fr = 12) 

> Global.annual <- aggregate(Global.ts, FUN = mean) 

> plot (Global.ts) 

> plot (Global. annual) 


It is the trend that is of most concern, so the aggregate function is used 
to remove any seasonal effects within each year and produce an annual series 
of mean temperatures for the period 1856 to 2005 (Fig. 1.11b). We can avoid 
explicitly dividing by 12 if we specify FUN=mean in the aggregate function. 

The upward trend from about 1970 onwards has been used as evidence 
of global warming (Fig. 1.12). In the code below, the monthly time inter- 
vals corresponding to the 36-year period 1970-2005 are extracted using the 
time function and the associated observed temperature series extracted using 
window. The data are plotted and a line superimposed using a regression of 
temperature on the new time index (Fig. 1.12). 


> New.series <- window(Global.ts, start-c(1970, 1), end=c(2005, 12)) 
> New.time <- time(New.series) 
> plot(New.series); abline(reg-lm(New.series ^ New.time)) 


In the previous section, we discussed a potential pitfall of inappropriate 
extrapolation. In climate change studies, a vital question is whether rising 
temperatures are a consequence of human activity, specifically the burning 
of fossil fuels and increased greenhouse gas emissions, or are a natural trend, 
perhaps part of a longer cycle, that may decrease in the future without needing 
a global reduction in the use of fossil fuels. We cannot attribute the increase in 
global temperature to the increasing use of fossil fuels without invoking some 
physical explanation!Ü because, as we noted in 81.4.3, two unrelated time 
series will be correlated if they both contain a trend. However, as the general 
consensus among scientists is that the trend in the global temperature series is 
related to a global increase in greenhouse gas emissions, it seems reasonable to 


For general policy documents and discussions on climate change, see the website 
(and links) for the United Nations Framework Convention on Climate Change at 
http://unfccc.int. 

? 'The data are updated regularly and can be downloaded free of charge from the 
Internet at: http:/ /www.cru.uea.ac.uk/cru/data/. 

1? For example, refer to US Energy Information Administration at 
http://www.eia.doe.gov/emeu/aer /inter.html. 
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(b) Mean annual series: 1856 to 2005 


Fig. 1.11. Time plots of the global temperature series (°C). 


temperature in °C 


-0.4 


T T T T T T T T 
1970 1975 1980 1985 1990 1995 2000 2005 


Time 


Fig. 1.12. Rising mean global temperatures, January 1970-December 2005. Ac- 
cording to the United Nations Framework Convention on Climate Change, the mean 
global temperature is expected to continue to rise in the future unless greenhouse 
gas emissions are reduced on a global scale. 
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acknowledge a causal relationship and to expect the mean global temperature 
to continue to rise if greenhouse gas emissions are not reduced.!! 


1.5 Decomposition of series 


1.5.1 Notation 


So far, our analysis has been restricted to plotting the data and looking for 
features such as trend and seasonal variation. This is an important first step, 
but to progress we need to fit time series models, for which we require some 
notation. We represent a time series of length n by (x, : t = 1,...,n} = 
[z1,22,..., Zn}. It consists of n values sampled at discrete times 1,2,...,n. 
The notation will be abbreviated to {x+} when the length n of the series 
does not need to be specified. The time series model is a sequence of random 
variables, and the observed time series is considered a realisation from the 
model. We use the same notation for both and rely on the context to make 
the distinction.!? An overline is used for sample means: 


z=) su (1.1) 


The ‘hat’ notation will be used to represent a prediction or forecast. For 
example, with the series {x+ : t = 1,...,n], Fite is a forecast made at time 
t for a future value at time t+ k. A forecast is a predicted future value, and 
the number of time steps into the future is the lead time (k). Following our 
convention for time series notation, 2,,,, can be the random variable or the 
numerical value, depending on the context. 


1.5.2 Models 


As the first two examples showed, many series are dominated by a trend 
and/or seasonal effects, so the models in this section are based on these com- 
ponents. À simple additive decomposition model is given by 


Lt = M + St + BH (1.2) 


where, at time t, x; is the observed series, m+ is the trend, s, is the seasonal 
effect, and z, is an error term that is, in general, a sequence of correlated 
random variables with mean zero. In this section, we briefly outline two main 
approaches for extracting the trend m, and the seasonal effect s; in Equation 
(1.2) and give the main R functions for doing this. 


11 Refer to http://unfccc.int. 
1? Some books do distinguish explicitly by using lowercase for the time series and 
uppercase for the model. 
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If the seasonal effect tends to increase as the trend increases, a multiplica- 
tive model may be more appropriate: 


Lt = Me + St + Rt (1.3) 


If the random variation is modelled by a multiplicative factor and the variable 
is positive, an additive decomposition model for log(z;) can be used:P? 


log(xt) = me + 5c zt (1.4) 


Some care is required when the exponential function is applied to the predicted 
mean of log(x;) to obtain a prediction for the mean value z+, as the effect is 
usually to bias the predictions. If the random series z; are normally distributed 
with mean 0 and variance c?, then the predicted mean value at time t based 
on Equation (1.4) is given by 


By = etts e30? (1.5) 


However, if the error series is not normally distributed and is negatively 
skewed,!^ as is often the case after taking logarithms, the bias correction 
factor will be an overcorrection (Exercise 4) and it is preferable to apply an 
empirical adjustment (which is discussed further in Chapter 5). The issue is 
of practical importance. For example, if we make regular financial forecasts 
without applying an adjustment, we are likely to consistently underestimate 
mean costs. 


1.5.3 Estimating trends and seasonal effects 


There are various ways to estimate the trend m, at time t, but a relatively 
simple procedure, which is available in R and does not assume any specific 
form is to calculate a moving average centred on x+. A moving average is 
an average of a specified number of time series values around each value in 
the time series, with the exception of the first few and last few terms. In this 
context, the length of the moving average is chosen to average out the seasonal 
effects, which can be estimated later. For monthly series, we need to average 
twelve consecutive months, but there is a slight snag. Suppose our time series 
begins at January (t = 1) and we average January up to December (t = 12). 
This average corresponds to a time t = 6.5, between June and July. When we 
come to estimate seasonal effects, we need a moving average at integer times. 
This can be achieved by averaging the average of January up to December 
and the average of February (t — 2) up to January (t — 13). This average of 


15 To be consistent with R, we use log for the natural logarithm, which is often 
written 1n. 

14 A probability distribution is negatively skewed if its density has a long tail to the 
left. 
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two moving averages corresponds to t = 7, and the process is called centring. 
Thus the trend at time t can be estimated by the centred moving average 


n irie + t-55 +... + Tt—1 + Tt + Tti +... + Lep5 + $L146 


where t = 7,...,n — 6. The coefficients in Equation (1.6) for each month 
are 1/12 (or sum to 1/12 in the case of the first and last coefficients), so that 
equal weight is given to each month and the coefficients sum to 1. By using the 
seasonal frequency for the coefficients in the moving average, the procedure 
generalises for any seasonal frequency (e.g., quarterly series), provided the 
condition that the coefficients sum to unity is still met. 
An estimate of the monthly additive effect (s;) at time t can be obtained 
by subtracting mu: 
St = %t— me (1.7) 


By averaging these estimates of the monthly effects for each month, we obtain 
a single estimate of the effect for each month. If the period of the time series 
is a whole number of years, the number of monthly effects averaged for each 
month is one less than the number of years of record. At this stage, the twelve 
monthly additive components should have an average value close to, but not 
usually exactly equal to, zero. It is usual to adjust them by subtracting this 
mean so that they do average zero. If the monthly effect is multiplicative, the 
estimate is given by division; i.e., § = z,/1h,. It is usual to adjust monthly 
multiplicative factors so that they average unity. The procedure generalises, 
using the same principle, to any seasonal frequency. 

It is common to present economic indicators, such as unemployment per- 
centages, as seasonally adjusted series. This highlights any trend that might 
otherwise be masked by seasonal variation attributable, for instance, to the 
end of the academic year, when school and university leavers are seeking work. 
If the seasonal effect is additive, a seasonally adjusted series is given by x, — s;, 
whilst if it is multiplicative, an adjusted series is obtained from z;/3;, where 
8, is the seasonally adjusted mean for the month corresponding to time t. 


1.5.4 Smoothing 


'The centred moving average is an example of a smoothing procedure that is 
applied retrospectively to a time series with the objective of identifying an un- 
derlying signal or trend. Smoothing procedures can, and usually do, use points 
before and after the time at which the smoothed estimate is to be calculated. 
A consequence is that the smoothed series will have some points missing at 
the beginning and the end unless the smoothing algorithm is adapted for the 
end points. 

A second smoothing algorithm offered by R is stl. This uses a locally 
weighted regression technique known as loess. The regression, which can be 
a line or higher polynomial, is referred to as local because it uses only some 
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relatively small number of points on either side of the point at which the 
smoothed estimate is required. The weighting reduces the influence of outlying 
points and is an example of robust regression. Although the principles behind 
stl are straightforward, the details are quite complicated. 

Smoothing procedures such as the centred moving average and loess do 
not require a predetermined model, but they do not produce a formula that 
can be extrapolated to give forecasts. Fitting a line to model a linear trend 
has an advantage in this respect. 

The term filtering is also used for smoothing, particularly in the engi- 
neering literature. A more specific use of the term filtering is the process of 
obtaining the best estimate of some variable now, given the latest measure- 
ment of it and past measurements. The measurements are subject to random 
error and are described as being corrupted by noise. Filtering is an important 
part of control algorithms which have a myriad of applications. An exotic ex- 
ample is the Huygens probe leaving the Cassini orbiter to land on Saturn’s 
largest moon, Titan, on January 14, 2005. 


1.5.5 Decomposition in R 


In R, the function decompose estimates trends and seasonal effects using 
a moving average method. Nesting the function within plot (e.g., using 
plot(st1())) produces a single figure showing the original series x; and the 
decomposed series mų, s+, and z+. For example, with the electricity data, addi- 
tive and multiplicative decomposition plots are given by the commands below; 
the last plot, which uses 1ty to give different line types, is the superposition 
of the seasonal effect on the trend (Fig. 1.13). 
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Fig. 1.13. Electricity production data: trend with superimposed multiplicative sea- 
sonal effects. 
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plot (decompose (Elec.ts)) 

Elec.decom <- decompose(Elec.ts, type = "mult") 
plot (Elec .decom) 

Trend <- Elec.decom$trend 

Seasonal <- Elec.decom$seasonal 
ts.plot(cbind(Trend, Trend * Seasonal), lty = 1:2) 


Decomposition of multiplicative time series 
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Fig. 1.14. Decomposition of the electricity production data. 


In this example, the multiplicative model would seem more appropriate 
than the additive model because the variance of the original series and trend 
increase with time (Fig. 1.14). However, the random component, which cor- 
responds to z, also has an increasing variance, which indicates that a log- 
transformation (Equation (1.4)) may be more appropriate for this series (Fig. 
1.14). The random series obtained from the decompose function is not pre- 
cisely a realisation of the random process z, but rather an estimate of that 
realisation. It is an estimate because it is obtained from the original time 
series using estimates of the trend and seasonal effects. This estimate of the 
realisation of the random process is a residual error series. However, we treat 
it as a realisation of the random process. 

There are many other reasonable methods for decomposing time series, 
and we cover some of these in Chapter 5 when we study regression methods. 
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1.6 Summary of commands used in examples 


read.table reads data into a data frame 

attach makes names of column variables available 

ts produces a time series object 

aggregate creates an aggregated series 

ts.plot produces a time plot for one or more series 

window extracts a subset of a time series 

time extracts the time from a time series object 

ts.intersect creates the intersection of one or more time series 

cycle returns the season for each value in a series 

decompose decomposes a series into the components 
trend, seasonal effect, and residual 

stl decomposes a series using loess smoothing 

summary summarises an R object 


1.7 Exercises 


1. Carry out the following exploratory time series analysis in R using either 

the chocolate or the beer production data from 81.4.3. 

a) Produce a time plot of the data. Plot the aggregated annual series and 
a boxplot that summarises the observed values for each season, and 
comment on the plots. 

b) Decompose the series into the components trend, seasonal effect, and 
residuals, and plot the decomposed series. Produce a plot of the trend 
with a superimposed seasonal effect. 


2. Many economic time series are based on indices. A price index is the 
ratio of the cost of a basket of goods now to its cost in some base year. 
In the Laspeyre formulation, the basket is based on typical purchases in 
the base year. You are asked to calculate an index of motoring cost from 
the following data. The clutch represents all mechanical parts, and the 
quantity allows for this. 


item quantity '00 unit price '00 quantity '04 unit price '04 


(i) (dio) (Dio) (dit) (Dit) 


car 0.33 18 000 0.5 20 000 
petrol (litre) — 2000 0.80 1500 1.60 
servicing (h) 40 40 20 60 

tyre 3 80 2 120 

clutch 2 200 1 360 


The Laspeyre Price Index at time t relative to base year 0 is 


ER 2 QioPit 
x qioPio 
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Calculate the LI, for 2004 relative to 2000. 


3. The Paasche Price Index at time t relative to base year 0 is 


PI, = py QitPit 
D ditPio 


a) Use the data above to calculate the PI, for 2004 relative to 2000. 

b) Explain why the PI, is usually lower than the L. 

c) Calculate the Irving-Fisher Price Index as the geometric mean of LI; 
and PI. (The geometric mean of a sample of n items is the nth root 
of their product.) 


4. A standard procedure for finding an approximate mean and variance of a 
function of a variable is to use a Taylor expansion for the function about 
the mean of the variable. Suppose the variable is y and that its mean and 
standard deviation are u and c respectively. 


oy) = 90) + 8-1) + EE + 9 EE +... 


Consider the case of ¢(.) as eU. By taking the expectation of both sides 
of this equation, explain why the bias correction factor given in Equation 
(1.5) is an overcorrection if the residual series has a negative skewness, 
where the skewness y of a random variable y is defined by 


. E |(y - uy3] 


g3 
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Correlation 


2.1 Purpose 


Once we have identified any trend and seasonal effects, we can deseasonalise 
the time series and remove the trend. If we use the additive decomposition 
method of §1.5, we first calculate the seasonally adjusted time series and 
then remove the trend by subtraction. This leaves the random component, 
but the random component is not necessarily well modelled by independent 
random variables. In many cases, consecutive variables will be correlated. If 
we identify such correlations, we can improve our forecasts, quite dramatically 
if the correlations are high. We also need to estimate correlations if we are 
to generate realistic time series for simulations. The correlation structure of a 
time series model is defined by the correlation function, and we estimate this 
from the observed time series. 

Plots of serial correlation (the ‘correlogram’, defined later) are also used 
extensively in signal processing applications. The paradigm is an underlying 
deterministic signal corrupted by noise. Signals from yachts, ships, aeroplanes, 
and space exploration vehicles are examples. At the beginning of 2007, NASA’s 
twin Voyager spacecraft were sending back radio signals from the frontier of 
our solar system, including evidence of hollows in the turbulent zone near the 
edge. 


2.2 Expectation and the ensemble 


2.2.1 Expected value 


The expected value, commonly abbreviated to expectation, E, of a variable, 
or a function of a variable, is its mean value in a population. So E(x) is the 
mean of x, denoted ju, and E [(x — ui)?] is the mean of the squared deviations 


! A more formal definition of the expectation E of a function $(z, y) of continuous 
random variables x and y, with a joint probability density function f(z, y), is the 
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about u, better known as the variance o? of x.? The standard deviation, c is 
the square root of the variance. If there are two variables (x, y), the variance 
may be generalised to the covariance, y(x, y). Covariance is defined by 


y(x, y) = E [(x — we) (y — uy] (2.1) 


'The covariance is a measure of linear association between two variables 
(z, y). In 81.4.3, we emphasised that a linear association between variables 
does not imply causality. 

Sample estimates are obtained by adding the appropriate function of the 
individual data values and division by n or, in the case of variance and co- 
variance, n — 1, to give unbiased estimators.? For example, if we have n data 
pairs, (zi, yi), the sample covariance is given by 


Cov(a, y) = $ "(zi — z)(yi — 9)/(n — 1) (2.2) 


If the data pairs are plotted, the lines x = x and y = y divide the plot into 
quadrants. Points in the lower left quadrant have both (x; — z) and (y; — y) 
negative, so the product that contributes to the covariance is positive. Points in 
the upper right quadrant also make a positive contribution. In contrast, points 
in the upper left and lower right quadrants make a negative contribution to the 
covariance. Thus, if y tends to increase when x increases, most of the points 
will be in the lower left and upper right quadrants and the covariance will 
be positive. Conversely, if y tends to decrease as x increases, the covariance 
will be negative. If there is no such linear association, the covariance will be 
small relative to the standard deviations of {x;} and {y;} — always check the 
plot in case there is a quadratic association or some other pattern. In R we 
can calculate a sample covariance, with denominator n — 1, from its definition 
or by using the function cov. If we use the mean function, we are implicitly 
dividing by n. 

Benzoapyrene is a carcinogenic hydrocarbon that is a product of incom- 
plete combustion. One source of benzoapyrene and carbon monoxide is au- 
tomobile exhaust. Colucci and Begeman (1971) analysed sixteen air samples 


mean value for ¢ obtained by integrating over all possible values of x and y: 
Eois.) = f f 6 fers dedy 
yc 


Note that the mean of x is obtained as the special case ó(x, y) = x. 

? For more than one variable, subscripts can be used to distinguish between the 
properties; e.g., for the means we may write fz and uy to distinguish between 
the mean of x and the mean of y. 

3 An estimator is unbiased for a population parameter if its average value, in in- 
finitely repeated samples of size n, equals that population parameter. If an esti- 
mator is unbiased, its value in a particular sample is referred to as an unbiased 
estimate. 


2.2 Expectation and the ensemble 29 


from Herald Square in Manhattan and recorded the carbon monoxide con- 
centration (x, in parts per million) and benzoapyrene concentration (y, in 
micrograms per thousand cubic metres) for each sample. The data are plotted 
in Figure 2.1. 


[s 
o ~ : o 
o : o 
5 : 
£ 04 i 8 o 
2 f 
SG f 
S t4 o! o 
| PRES m 
D i 
a 4 ° 
8 6 o 
eam o 
T T T T 
5 10 15 20 
CO 


Fig. 2.1. Sixteen air samples from Herald Square. 


> www <- "http://www.massey.ac.nz/^pscowper/ts/Herald.dat" 
> Herald.dat <- read.table(www, header = T) 
> attach (Herald.dat) 


We now use R to calculate the covariance for the Herald Square pairs in 
three different ways: 


> x <- CO; y <- Benzoa; n <- length(x) 
> sum((x - mean(x))*(y - mean(y))) / (n- 1) 


[1] 5.51 
> mean((x - mean(x)) * (y - mean(y))) 
[1] 5.17 


> cov(x, y) 


[1] 5.51 


'The correspondence between the R code above and the expectation defini- 
tion of covariance should be noted: 


mean((x - mean(x))*(y - mean(y))) E [(x — ux)(y — Hy)] (2.3) 
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Given this correspondence, the more natural estimate of covariance would 
be mean((x - mean(x))*(y - mean(y))). However, as can be seen above, 
the values computed using the internal function cov are those obtained using 
sum with a denominator of n — 1. As n gets large, the difference in denomi- 
nators becomes less noticeable and the more natural estimate asymptotically 
approaches the unbiased estimate.^ 
Correlation is a dimensionless measure of the linear association between 
a pair of variables (x,y) and is obtained by standardising the covariance by 
dividing it by the product of the standard deviations of the variables. Corre- 
lation takes a value between —1 and +1, with a value of 0 indicating no linear 
association. The population correlation, p, between a pair of variables (x, y) 
is defined by 
E [(x = be)(y = by)| _ YG v) (2.4) 


p(x, y) = = 
050 O50 


'The sample correlation, Cor, is an estimate of p and is calculated as 


Cov(z, y) 


sated) (9) 


Cor(z, y) — 


In R, the sample correlation for pairs (x;, yi) stored in vectors x and y is 
cor(x,y). A value of +1 or —1 indicates an exact linear association, with the 
(x, y) pairs falling on a straight line of positive or negative slope, respectively. 
'The correlation between the CO and benzoapyrene measurements at Herald 
Square is now calculated both from the definition and using cor. 


> cov(x,y) / (sd(x)*sd(y)) 
[1] 0.3551 
> cor(x,y) 
[1] 0.3551 


Although the correlation is small, there is nevertheless a physical expla- 
nation for the correlation because both products are a result of incomplete 
combustion. A correlation of 0.36 typically corresponds to a slight visual im- 
pression that y tends to increase as x increases, although the points will be 
well scattered. 


2.2.2 The ensemble and stationarity 
The mean function of a time series model is 
y(t) = E (ax) (2.6) 


and, in general, is a function of t. The expectation in this definition is an 
average taken across the ensemble of all the possible time series that might 


^ In statistics, asymptotically means as the sample size approaches infinity. 
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have been produced by the time series model (Fig. 2.2). The ensemble consti- 
tutes the entire population. If we have a time series model, we can simulate 
more than one time series (see Chapter 4). However, with historical data, we 
usually only have a single time series so all we can do, without assuming a 
mathematical structure for the trend, is to estimate the mean at each sample 
point by the corresponding observed value. In practice, we make estimates of 
any apparent trend and seasonal effects in our data and remove them, using 
decompose for example, to obtain time series of the random component. Then 
time series models with a constant mean will be appropriate. 

If the mean function is constant, we say that the time series model is 
stationary in the mean. The sample estimate of the population mean, pu, is 
the sample mean, 2: 


z=) nn (2.7) 


Equation (2.7) does rely on an assumption that a sufficiently long time series 
characterises the hypothetical model. Such models are known as ergodic, and 
the models in this book are all ergodic. 


2.2.3 Ergodic series* 


A time series model that is stationary in the mean is ergodic in the mean if 
the time average for a single time series tends to the ensemble mean as the 
length of the time series increases: 


rm Sy, (2.8) 


This implies that the time average is independent of the starting point. Given 
that we usually only have a single time series, you may wonder how a time 
series model can fail to be ergodic, or why we should want a model that is 
not ergodic. Environmental and economic time series are single realisations of 
a hypothetical time series model, and we simply define the underlying model 
as ergodic. 

There are, however, cases in which we can have many time series arising 
from the same time series model. Suppose we investigate the acceleration at 
the pilot seat of a new design of microlight aircraft in simulated random gusts 
in a wind tunnel. Even if we have built two prototypes to the same design, 
we cannot be certain they will have the same average acceleration response 
because of slight differences in manufacture. In such cases, the number of time 
series is equal to the number of prototypes. Another example is an experiment 
investigating turbulent flows in some complex system. It is possible that we 
will obtain qualitatively different results from different runs because they do 
depend on initial conditions. It would seem better to run an experiment in- 
volving turbulence many times than to run it once for a much longer time. 
The number of runs is the number of time series. It is straightforward to adapt 
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Fig. 2.2. An ensemble of time series. The expected value E(x+) at a particular time 
t is the average taken over the entire population. 


a stationary time series model to be non-ergodic by defining the means for 

the individual time series to be from some probability distribution. 

2.2.4 Variance function 

'The variance function of a time series model that is stationary in the mean is 
e*(t) = E (zi — uy] (2.9) 


which can, in principle, take a different value at every time t. But we cannot 
estimate a different variance at each time point from a single time series. To 
progress, we must make some simplifying assumption. If we assume the model 


is stationary in the variance, this constant population variance, c?, can be 
estimated from the sample variance: 
> (n-z 
Var(x) = £—————— (2.10) 


n—1 
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In a time series analysis, sequential observations may be correlated. If the cor- 
relation is positive, Var(x) will tend to underestimate the population variance 
in a short series because successive observations tend to be relatively similar. 
In most cases, this does not present a problem since the bias decreases rapidly 
as the length n of the series increases. 


2.2.5 Autocorrelation 


The mean and variance play an important role in the study of statistical 
distributions because they summarise two key distributional properties — a 
central location and the spread. Similarly, in the study of time series models, 
a key role is played by the second-order properties, which include the mean, 
variance, and serial correlation (described below). 

Consider a time series model that is stationary in the mean and the vari- 
ance. The variables may be correlated, and the model is second-order sta- 
tionary if the correlation between variables depends only on the number of 
time steps separating them. The number of time steps between the variables 
is known as the lag. A correlation of a variable with itself at different times 
is known as autocorrelation or serial correlation. If a time series model is 
second-order stationary, we can define an autocovariance function (acuf), Yk, 
as a function of the lag k: 


Ye = E [(x« — u) (214% — u)] (2.11) 


The function yz, does not depend on t because the expectation, which is across 
the ensemble, is the same at all times t. This definition follows naturally from 
Equation (2.1) by replacing x with 2; and y with r,,; and noting that the 
mean p is the mean of both a; and £+. The lag k autocorrelation function 
(acf), px, is defined by 


k 
pe = (2.12) 
It follows from the definition that po is 1. 

It is possible to set up a second-order stationary time series model that 
has skewness; for example, one that depends on time t. Applications for such 
models are rare, and it is customary to drop the term ‘second-order’ and 
use ‘stationary’ on its own for a time series model that is at least second- 
order stationary. The term strictly stationary is reserved for more rigorous 
conditions. 

The acvf and acf can be estimated from a time series by their sample 
equivalents. The sample acvf, c, is calculated as 


Ck = i 3 (a — z) (stk — z) (2.13) 


Note that the autocovariance at lag 0, co, is the variance calculated with a 
denominator n. Also, a denominator n is used when calculating cj, although 
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only n — k terms are added to form the numerator. Adopting this definition 
constrains all sample autocorrelations to lie between —1 and 1. The sample 
acf is defined as P 
Tk = — (2.14) 
Co 
We will demonstrate the calculations in R using a time series of wave 
heights (mm relative to still water level) measured at the centre of a wave tank. 
The sampling interval is 0.1 second and the record length is 39.7 seconds. The 
waves were generated by a wave maker driven by a pseudo-random signal that 
was programmed to emulate a rough sea. There is no trend and no seasonal 
period, so it is reasonable to suppose the time series is a realisation of a 
stationary process. 


> www <- "http://www.massey.ac.nz/^pscowper/ts/wave.dat" 
> wave.dat <- read.table (www, header=T) ; attach(wave.dat) 
> plot(ts(waveht)) ; plot(ts(waveht[1:60])) 


The upper plot in Figure 2.3 shows the entire time series. There are no outlying 
values. The lower plot is of the first sixty wave heights. We can see that there 
is a tendency for consecutive values to be relatively similar and that the form 
is like a rough sea, with a quasi-periodicity but no fixed frequency. 
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(b) Wave height over 6 seconds 


Fig. 2.3. Wave height at centre of tank sampled at 0.1 second intervals. 
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The autocorrelations of x are stored in the vector acf (x)$acf, with the 
lag k autocorrelation located in acf(x)$acf[k+1]. For example, the lag 1 
autocorrelation for waveht is 


> acf(waveht)$acf [2] 


[1] 0.47 


The first entry, acf (waveht) $acf [1], is ro and equals 1. A scatter plot, such 
as Figure 2.1 for the Herald Square data, complements the calculation of 
the correlation and alerts us to any non-linear patterns. In a similar way, 
we can draw a scatter plot corresponding to each autocorrelation. For ex- 
ample, for lag 1 we plot (waveht [1:396] ,waveht [2:397]) to obtain Figure 
2.4. Autocovariances are obtained by adding an argument to acf. The lag 1 
autocovariance is given by 


> acf(waveht, type = c("covariance"))$acf [2] 


[1] 33328 
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Fig. 2.4. Wave height pairs separated by a lag of 1. 


2.3 The correlogram 


2.3.1 General discussion 


By default, the acf function produces a plot of rj against k, which is called 
the correlogram. For example, Figure 2.5 gives the correlogram for the wave 
heights obtained from acf (waveht). In general, correlograms have the follow- 
ing features: 
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Fig. 2.5. Correlogram of wave heights. 


The z-axis gives the lag (k) and the y-axis gives the autocorrelation (rj) at 
each lag. The unit of lag is the sampling interval, 0.1 second. Correlation 
is dimensionless, so there is no unit for the y-axis. 

If pk = 0, the sampling distribution of rg is approximately normal, with a 
mean of —1/n and a variance of 1/n. The dotted lines on the correlogram 


are drawn at 
1 2 


n yn 

If rz, falls outside these lines, we have evidence against the null hypothesis 
that pj = 0 at the 5% level. However, we should be careful about inter- 
preting multiple hypothesis tests. Firstly, if p; does equal 0 at all lags k, 
we expect 5% of the estimates, rz, to fall outside the lines. Secondly, the 
rj, are correlated, so if one falls outside the lines, the neighbouring ones are 
more likely to be statistically significant. This will become clearer when 
we simulate time series in Chapter 4. In the meantime, it is worth looking 
for statistically significant values at specific lags that have some practical 
meaning (for example, the lag that corresponds to the seasonal period, 
when there is one). For monthly series, a significant autocorrelation at lag 
12 might indicate that the seasonal adjustment is not adequate. 

'The lag 0 autocorrelation is always 1 and is shown on the plot. Its inclusion 
helps us compare values of the other autocorrelations relative to the theo- 
retical maximum of 1. This is useful because, if we have a long time series, 
small values of rj, that are of no practical consequence may be statistically 
significant. However, some discernment is required to decide what consti- 
tutes a noteworthy autocorrelation from a practical viewpoint. Squaring 
the autocorrelation can help, as this gives the percentage of variability 
explained by a linear relationship between the variables. For example, a 
lag 1 autocorrelation of 0.1 implies that a linear dependency of x; on x41 
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would only explain 1% of the variability of z;. It is a common fallacy to 
treat a statistically significant result as important when it has almost no 
practical consequence. 

e The correlogram for wave heights has a well-defined shape that appears 
like a sampled damped cosine function. This is typical of correlograms 
of time series generated by an autoregressive model of order 2. We cover 
autoregressive models in Chapter 4. 


If you look back at the plot of the air passenger bookings, there is a clear 
seasonal pattern and an increasing trend (Fig. 1.1). It is not reasonable to 
claim the time series is a realisation of a stationary model. But, whilst the 
population acf was defined only for a stationary time series model, the sample 
acf can be calculated for any time series, including deterministic signals. Some 
results for deterministic signals are helpful for explaining patterns in the acf 
of time series that we do not consider realisations of some stationary process: 


e Ifyou construct a time series that consists of a trend only, the integers from 
1 up to 1000 for example, the acf decreases slowly and almost linearly from 
1. 
e If you take a large number of cycles of a discrete sinusoidal wave of any 
amplitude and phase, the acf is a discrete cosine function of the same 
period. 
e If you construct a time series that consists of an arbitrary sequence of p 
numbers repeated many times, the correlogram has a dominant spike of 
almost 1 at lag p. 


Usually a trend in the data will show in the correlogram as a slow decay in 
the autocorrelations, which are large and positive due to similar values in the 
series occurring close together in time. This can be seen in the correlogram for 
the air passenger bookings acf (AirPassengers) (Fig. 2.6). If there is seasonal 
variation, seasonal spikes will be superimposed on this pattern. The annual 
cycle appears in the air passenger correlogram as a cycle of the same period 
superimposed on the gradually decaying ordinates of the acf. This gives a 
maximum at a lag of 1 year, reflecting a positive linear relationship between 
pairs of variables (x+, 24412) separated by 12-month periods. Conversely, be- 
cause the seasonal trend is approximately sinusoidal, values separated by a 
period of 6 months will tend to have a negative relationship. For example, 
higher values tend to occur in the summer months followed by lower values 
in the winter months. A dip in the acf therefore occurs at lag 6 months (or 
0.5 years). Although this is typical for seasonal variation that is approximated 
by a sinusoidal curve, other series may have patterns, such as high sales at 
Christmas, that contribute a single spike to the correlogram. 


2.3.2 Example based on air passenger series 


Although we want to know about trends and seasonal patterns in a time series, 
we do not necessarily rely on the correlogram to identify them. The main use 
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Fig. 2.6. Correlogram for the air passenger bookings over the period 1949-1960. 
The gradual decay is typical of a time series containing a trend. The peak at 1 year 
indicates seasonal variation. 


of the correlogram is to detect autocorrelations in the time series after we 
have removed an estimate of the trend and seasonal variation. In the code 
below, the air passenger series is seasonally adjusted and the trend removed 
using decompose. To plot the random component and draw the correlogram, 
we need to remember that a consequence of using a centred moving average of 
12 months to smooth the time series, and thereby estimate the trend, is that 
the first six and last six terms in the random component cannot be calculated 
and are thus stored in R as NA. The random component and correlogram are 
shown in Figures 2.7 and 2.8, respectively. 


data(AirPassengers) 

AP <- AirPassengers 

AP.decom <- decompose(AP, "multiplicative") 
plot (ts (AP.decom$random[7:138])) 

acf (AP .decom$random [7 : 138] ) 


V VV MM 


'The correlogram in Figure 2.8 suggests either a damped cosine shape that 
is characteristic of an autoregressive model of order 2 (Chapter 4) or that the 
seasonal adjustment has not been entirely effective. The latter explanation is 
unlikely because the decomposition does estimate twelve independent monthly 
indices. If we investigate further, we see that the standard deviation of the 
original series from July until June is 109, the standard deviation of the series 
after subtracting the trend estimate is 41, and the standard deviation after 
seasonal adjustment is just 0.03. 


> sd(AP[7:138]) 
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Fig. 2.7. The random component of the air passenger series after removing the 
trend and the seasonal variation. 
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Fig. 2.8. Correlogram for the random component of air passenger bookings over 
the period 1949-1960. 


[1] 109 
> sd(AP[7:138] - AP.decom$trend[7:138]) 
[1] 41.1 


> sd(AP.decom$random [7 :138] ) 


[1] 0.0335 


The reduction in the standard deviation shows that the seasonal adjustment 
has been very effective. 
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2.3.3 Example based on the Font Reservoir series 


Monthly effective inflows (m°s~') to the Font Reservoir in Northumberland 
for the period from January 1909 until December 1980 have been provided by 
Northumbrian Water PLC. A plot of the data is shown in Figure 2.9. There 
was a slight decreasing trend over this period, and substantial seasonal vari- 
ation. The trend and seasonal variation have been estimated by regression, 
as described in Chapter 5, and the residual series (adflow), which we anal- 
yse here, can reasonably be considered a realisation from a stationary time 
series model. The main difference between the regression approach and us- 
ing decompose is that the former assumes a linear trend, whereas the latter 
smooths the time series without assuming any particular form for the trend. 
The correlogram is plotted in Figure 2.10. 


www <- "http://www.massey.ac.nz/^pscowper/ts/Fontdsdt.dat" 
Fontdsdt.dat «- read.table(www, header-T) 
attach(Fontdsdt.dat) 

plot(ts(adflow), ylab = 'adflow!) 

acf(adflow, xlab = 'lag (months)', main="") 
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Fig. 2.9. Adjusted inflows to the Font Reservoir, 1909-1980. 


There is a statistically significant correlation at lag 1. The physical inter- 
pretation is that the inflow next month is more likely than not to be above 
average if the inflow this month is above average. Similarly, if the inflow this 
month is below average it is more likely than not that next month’s inflow 
will be below average. The explanation is that the groundwater supply can be 
thought of as a slowly discharging reservoir. If groundwater is high one month 
it will augment inflows, and is likely to do so next month as well. Given this 
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Fig. 2.10. Correlogram for adjusted inflows to the Font Reservoir, 1909-1980. 


explanation, you may be surprised that the lag 1 correlation is not higher. 
'The explanation for this is that most of the inflow is runoff following rainfall, 
and in Northumberland there is little correlation between seasonally adjusted 
rainfall in consecutive months. An exponential decay in the correlogram is 
typical of a first-order autoregressive model (Chapter 4). The correlogram of 
the adjusted inflows is consistent with an exponential decay. However, given 
the sampling errors for a time series of this length, estimates of autocorre- 
lation at higher lags are unlikely to be statistically significant. This is not a 
practical limitation because such low correlations are inconsequential. When 
we come to identify suitable models, we should remember that there is no one 
correct model and that there will often be a choice of suitable models. We 
may make use of a specific statistical criterion such as Akaike’s information 
criterion, introduced in Chapter 5, to choose a model, but this does not imply 
that the model is correct. 


2.4 Covariance of sums of random variables 


In subsequent chapters, second-order properties for several time series models 


are derived using the result shown in Equation (2.15). Let £1, £2,..., £n and 
Vi U2; ---, Ym be random variables. Then 
Cov 3 75937 = XO Cov(zi yj) (2.15) 
i=1 — j-1 i=1 j—1 


where Cov(z, y) is the covariance between a pair of random variables x and 
y. The result tells us that the covariance of two sums of variables is the sum 


42 2 Correlation 


of all possible covariance pairs of the variables. Note that the special case of 
n = m and z; = yi (i = 1,...,n) occurs in subsequent chapters for a time 
series {x+}. The proof of Equation (2.15) is left to Exercise 5a. 


2.5 Summary of commands used in examples 


mean returns the mean (average) 

var returns the variance with denominator n — 1 

sd returns the standard deviation 

cov returns the covariance with denominator n — 1 

cor returns the correlation 

acf returns the correlogram (or sets the argument 
to obtain autocovariance function) 


2.6 Exercises 


1. On the book's website, you will find two small bivariate data sets that are 
not time series. Draw a scatter plot for each set and then calculate the 
correlation. Comment on your results. 


a) The data in the file varnish.dat are the amount of catalyst in a var- 
nish, z, and the drying time of a set volume in a petri dish, y. 


b) The data in the file guesswhat.dat are data pairs. Can you see a 
pattern? Can you guess what they represent? 


2. The following data are the volumes, relative to nominal contents of 750 ml, 
of 16 bottles taken consecutively from the filling machine at the Serendip- 
ity Shiraz vineyard: 


39; 35. 16, 18; 7, 22, 13; 18; 20, 9, —12, —11, —19; —9, —2, 16 


'The following are the volumes, relative to nominal contents of 750 ml, of 
consecutive bottles taken from the filling machine at the Cagey Chardon- 
nay vineyard: 


AT, —26, 42, —10, 27, —8, 16, 6, —1, 25, 11, 1, 25, 7, —5, 3 


'The data are also available from the website in the file ch2ex2.dat. 
a) Produce time plots of the two time series. 

b) For each time series, draw a lag 1 scatter plot. 

c) Produce the acf for both time series and comment. 


3. 


5. 
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Carry out the following exploratory time series analysis using the global 

temperature series from 81.4.5. 

a) Decompose the series into the components trend, seasonal effect, and 
residuals. Plot these components. Would you expect these data to have 
a substantial seasonal component? Compare the standard deviation of 
the original series with the deseasonalised series. Produce a plot of the 
trend with a superimposed seasonal effect. 

b) Plot the correlogram of the residuals (random component) from part 
(a). Comment on the plot, with particular reference to any statistically 
significant correlations. 


The monthly effective inflows (m?s^ 1) to the Font Reservoir are in the file 
Font.dat. Use decompose on the time series and then plot the correlogram 
of the random component. Compare this with Figure 2.10 and comment. 


&) Prove Equation (2.15), using the following properties of summation, 
expectation, and covariance: 


Vien ti id yj = Ma x Tiyj 
E Dia Bi] = ik E (21) 
Cov (z, y) = E (zy) — E (v) E (y) 
b) By taking n = m = 2 and z; = yi in Equation (2.15), derive the 
well-known result 


Var (x + y) = Var (x) + Var (y) + 2 Cov (x, y) 


c) Verify the result in part (b) above using R with x and y (CO and 
Benzoa, respectively) taken from §2.2.1. 
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Forecasting Strategies 


3.1 Purpose 


Businesses rely on forecasts of sales to plan production, justify marketing de- 
cisions, and guide research. A very efficient method of forecasting one variable 
is to find a related variable that leads it by one or more time intervals. The 
closer the relationship and the longer the lead time, the better this strategy 
becomes. The trick is to find a suitable lead variable. An Australian example 
is the Building Approvals time series published by the Australian Bureau of 
Statistics. This provides valuable information on the likely demand over the 
next few months for all sectors of the building industry. A variation on the 
strategy of seeking a leading variable is to find a variable that is associated 
with the variable we need to forecast and easier to predict. 

In many applications, we cannot rely on finding a suitable leading variable 
and have to try other methods. A second approach, common in marketing, 
is to use information about the sales of similar products in the past. The in- 
fluential Bass diffusion model is based on this principle. A third strategy is 
to make extrapolations based on present trends continuing and to implement 
adaptive estimates of these trends. The statistical technicalities of forecast- 
ing are covered throughout the book, and the purpose of this chapter is to 
introduce the general strategies that are available. 


3.2 Leading variables and associated variables 


3.2.1 Marine coatings 


A leading international marine paint company uses statistics available in the 
public domain to forecast the numbers, types, and sizes of ships to be built 
over the next three years. One source of such information is World Shipyard 
Monitor, which gives brief details of orders in over 300 shipyards. The paint 
company has set up a database of ship types and sizes from which it can 
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forecast the areas to be painted and hence the likely demand for paint. The 
company monitors its market share closely and uses the forecasts for planning 
production and setting prices. 


3.2.2 Building approvals publication 
Building approvals and building activity time series 


The Australian Bureau of Statistics publishes detailed data on building ap- 
provals for each month, and, a few weeks later, the Building Activity Publi- 
cation lists the value of building work done in each quarter. The data in the 
file ApprovActiv.dat are the total dwellings approved per month, averaged 
over the past three months, labelled “Approvals”, and the value of work done 
over the past three months (chain volume measured in millions of Australian 
dollars at the reference year 2004-05 prices), labelled “Activity”, from March 
1996 until September 2006. We start by reading the data into R and then 
construct time series objects and plot the two series on the same graph using 
ts.plot (Fig. 3.1). 


www <- "http://www.massey.ac.nz/~pscowper/ts/ApprovActiv.dat" 
Build.dat <- read.table(www, header=T) ; attach(Build.dat) 
App.ts <- ts(Approvals, start = c(1996,1), freq=4) 

Act.ts <- ts(Activity, start = c(1996,1), freq=4) 
ts.plot(App.ts, Act.ts, lty = c(1,3)) 
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Fig. 3.1. Building approvals (solid line) and building activity (dotted line). 


In Figure 3.1, we can see that the building activity tends to lag one quarter 
behind the building approvals, or equivalently that the building approvals ap- 
pear to lead the building activity by a quarter. The cross-correlation function, 
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which is abbreviated to ccf, can be used to quantify this relationship. A plot of 
the cross-correlation function against lag is referred to as a cross-correlogram. 


Cross-correlation 


Suppose we have time series models for variables x and y that are stationary 
in the mean and the variance. The variables may each be serially correlated, 
and correlated with each other at different time lags. The combined model is 
second-order stationary if all these correlations depend only on the lag, and 
then we can define the cross covariance function (ccvf ), ^ (x, y), as a function 
of the lag, k: 

^i (2, y) =E [(£t+k a Ma) (Yt E My) (3.1) 


This is not a symmetric relationship, and the variable x is lagging variable 
y by k. If x is the input to some physical system and y is the response, the 
cause will precede the effect, y will lag x, the ccvf will be 0 for positive k, and 
there will be spikes in the ccvf at negative lags. Some textbooks define ccvf 
with the variable y lagging when k is positive, but we have used the definition 
that is consistent with R. Whichever way you choose to define the ccvf, 


y(x, y) = y-K(y, x) (3.2) 


When we have several variables and wish to refer to the acvf of one rather 
than the ccvf of a pair, we can write it as, for example, y(x, x). The lag k 
cross-correlation function (ccf), px (x, y), is defined by 


px(z,y) = Um 


'The ccvf and ccf can be estimated from a time series by their sample 
equivalents. The sample ccvf, c (x,y), is calculated as 


uut 2 (te4k — z) (y: — y) (3.4) 


a 


The sample acf is defined as 


aiee cel) 
A E) ee) 


Cross-correlation between building approvals and activity 


The ts.union function binds time series with a common frequency, padding 
with ‘NA’s to the union of their time coverages. If ts. union is used within 
the acf command, R returns the correlograms for the two variables and the 
cross-correlograms in a single figure. 
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Fig. 3.2. Correlogram and cross-correlogram for building approvals and building 
activity. 


> acf(ts.union(App.ts, Act.ts)) 


In Figure 3.2, the acfs for z and y are in the upper left and lower right 
frames, respectively, and the ccfs are in the lower left and upper right frames. 
'The time unit for lag is one year, so a correlation at a lag of one quarter ap- 
pears at 0.25. If the variables are independent, we would expect 596 of sample 
correlations to lie outside the dashed lines. Several of the cross-correlations 
at negative lags do pass these lines, indicating that the approvals time series 
is leading the activity. Numerical values can be printed using the print () 
function, and are 0.432, 0.494, 0.499, and 0.458 at lags of 0, 1, 2, and 3, re- 
spectively. The ccf can be calculated for any two time series that overlap, 
but if they both have trends or similar seasonal effects, these will dominate 
(Exercise 1). It may be that common trends and seasonal effects are precisely 
what we are looking for, but the population ccf is defined for stationary ran- 
dom processes and it is usual to remove the trend and seasonal effects before 
investigating cross-correlations. Here we remove the trend using decompose, 
which uses a centred moving average of the four quarters (see Fig. 3.3). We 
will discuss the use of ccf in later chapters. 
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app.ran <- decompose (App.ts) $random 

app.ran.ts <- window (app.ran, start = c(1996, 3) ) 
act.ran <- decompose (Act.ts)$random 

act.ran.ts <- window (act.ran, start = c(1996, 3) ) 
acf (ts.union(app.ran.ts, act.ran.ts)) 

ccf (app.ran.ts, act.ran.ts) 


V MN NM M 


We again use print() to obtain the following table. 


> print(acf(ts.union(app.ran.ts, act.ran.ts))) 


app.ran.ts act.ran.ts 
1.000 ( 0.00) 0.123 ( 0.00) 
0.422 ( 0.25) 0.704 (-0.25) 
-0.328 ( 0.50) 0.510 (-0.50) 
-0.461 ( 0.75) -0.135 (-0.75) 
-0.400 ( 1.00) -0.341 (-1.00) 
-0.193 ( 1.25) -0.187 (-1.25) 
app.ran.ts act.ran.ts 
0.123 ( 0.00) 1.000 ( 0.00) 
-0.400 ( 0.25) 0.258 ( 0.25) 
-0.410 ( 0.50) -0.410 ( 0.50) 
-0.250 ( 0.75) -0.411 ( 0.75) 
0.071 ( 1.00) -0.112 ( 1.00) 
0.353 ( 1.25) 0.180 ( 1.25) 


'The ccf function produces a single plot, shown in Figure 3.4, and again 
shows the lagged relationship. The Australian Bureau of Statistics publishes 
the building approvals by state and by other categories, and specific sectors of 
the building industry may find higher correlations between demand for their 
products and one of these series than we have seen here. 


3.2.3 Gas supply 


Gas suppliers typically have to place orders for gas from offshore fields 24 hours 
ahead. Variation about the average use of gas, for the time of year, depends 
on temperature and, to some extent, humidity and wind speed. Coleman et al. 
(2001) found that the weather accounts for 9096 of this variation in the United 
Kingdom. Weather forecasts for the next 24 hours are now quite accurate and 
are incorporated into the forecasting procedure. 
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Fig. 3.3. Correlogram and cross-correlogram of the random components of building 
approvals and building activity after using decompose. 
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Fig. 3.4. Cross-correlogram of the random components of building approvals and 
building activity after using decompose. 
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3.3 Bass model 


3.3.1 Background 


Frank Bass published a paper describing his mathematical model, which quan- 
tified the theory of adoption and diffusion of a new product by society (Rogers, 
1962), in Management Science nearly fifty years ago (Bass, 1969). The mathe- 
matics is straightforward, and the model has been influential in marketing. An 
entrepreneur with a new invention will often use the Bass model when mak- 
ing a case for funding. There is an associated demand for market research, as 
demonstrated, for example, by the Marketing Science Centre at the Univer- 
sity of South Australia becoming the Ehrenberg-Bass Institute for Marketing 
Science in 2005. 


3.3.2 Model definition 


The Bass formula for the number of people, N+, who have bought a product at 
time t depends on three parameters: the total number of people who eventually 
buy the product, m; the coefficient of innovation, p; and the coefficient of 
imitation, q. The Bass formula is 


Na = N; + p(m — N,) +4Ni(m — Nz) /m (3.6) 


According to the model, the increase in sales, N;,1 — Nz, over the next time 
period is equal to the sum of a fixed proportion p and a time varying proportion 
q% of people who will eventually buy the product but have not yet done so. 
The rationale for the model is that initial sales will be to people who are 
interested in the novelty of the product, whereas later sales will be to people 
who are drawn to the product after seeing their friends and acquaintances use 
it. Equation (3.6) is a difference equation and its solution is 


1 — e7 (2+a)t 
"IF (a/p)e- 10 


M= (3.7) 


It is easier to verify this result for the continuous-time version of the model. 


3.3.3 Interpretation of the Bass model* 


One interpretation of the Bass model is that the time from product launch 
until purchase is assumed to have a probability distribution that can be 
parametrised in terms of p and q. A plot of sales per time unit against time is 
obtained by multiplying the probability density by the number of people, m, 
who eventually buy the product. Let f(t), F(t), and h(t) be the density, cumu- 
lative distribution function (cdf), and hazard, respectively, of the distribution 
of time until purchase. The definition of the hazard is 
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f(t) 
h(t) = —— 3.8 
=i Fo (3.8) 
'The interpretation of the hazard is that if it is multiplied by a small time 
increment it gives the probability that a random purchaser who has not yet 
made the purchase will do so in the next small time increment (Exercise 2). 
'Then the continuous time model of the Bass formula can be expressed in terms 
of the hazard: 
h(t) = p+ qF(t) (3.9) 
Equation (3.6) is the discrete form of Equation (3.9) (Exercise 2). The solution 
of Equation (3.8), with h(t) given by Equation (3.9), for F(t) is 


1 —e (tat 
1+ (q/pje@+9t 


Two special cases of the distribution are the exponential distribution and lo- 
gistic distribution, which arise when q = 0 and p = 0, respectively. The logistic 
distribution closely resembles the normal distribution (Exercise 3). Cumula- 
tive sales are given by the product of m and F(t). The pdf is the derivative 
of Equation (3.10): 


F(t) 


(3.10) 


2o-(p+aq)t 
E as (3.11) 


p [1+ (a/p)e- 0+0]? 
Sales per unit time at time t are 


Bü) df) o pc qe eror (3.12) 


— p[1- (g/p)e- 0+0]? 


'The time to peak is 
log(q) — log(p) 


3.13 
PRU (3.13) 


tpeak — 


3.3.4 Example 


We show a typical Bass curve by fitting Equation (3.12) to yearly sales of 
VCRs in the US home market between 1980 and 1989 (Bass website) using 
the R non-linear least squares function nls. The variable T79 is the year from 
1979, and the variable Tdelt is the time from 1979 at a finer resolution of 
0.1 year for plotting the Bass curves. The cumulative sum function cumsum is 
useful for monitoring changes in the mean level of the process (Exercise 8). 


T79 «- 1:10 

Tdelt «- (1:100) / 10 

Sales <- c(840,1470,2110,4000, 7590, 10950, 10530, 9470, 7790, 5890) 
Cusales <- cumsum(Sales) 

Bass.nls <- nls(Sales ^ M * ( ((P4*Q)^2 / P) * exp(-(P+Q) * T79) ) / 
(1* (Q/P)*exp(-(P*Q)*T79))^2, start = list(M-60630, P=0.03, Q=0.38)) 
summary (Bass.nls) 
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Parameters: 

Estimate Std. Error t value Pr(>|t]) 
M 6.798e+04 3.128e+03 21.74 1.10e-07 *** 
P 6.594e-03 1.430e-03 4.61 0.00245 xx 
Q 6.381e-01 4.140e-02 15.41 1.17e-06 **x 


Residual standard error: 727.2 on 7 degrees of freedom 


'The final estimates for m, p, and q, rounded to two significant places, are 
68000, 0.0066, and 0.64 respectively. The starting values for P and Q are p and 
q for a typical product. We assume the sales figures are prone to error and 
estimate the total sales, m, setting the starting value for M to the recorded 
total sales. The data and fitted curve can be plotted using the code below (see 
Fig. 3.5 and 3.6): 


Bcoef «- coef(Bass.nls) 
m <- Bcoef[1] 
p <- Bcoef [2] 
q <- Bcoef [3] 
ngete <- exp(-(ptq) * Tdelt) 
Bpdf <- m * ( (p*tq)^2 / p ) * ngete / (1 + (g/p) * ngete)^2 
plot(Tdelt, Bpdf, xlab - "Year from 1979", 

ylab = "Sales per year", type-'l') 
> points(T79, Sales) 
» Bcdf «- m * (1 - ngete)/(1 * (g/p)*ngete) 
> plot(Tdelt, Bcdf, xlab = "Year from 1979", 

ylab = "Cumulative sales", type='1') 

> points(T79, Cusales) 
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Fig. 3.5. Bass sales curve fitted to sales of VCRs in the US home market, 1980-1989. 
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Fig. 3.6. Bass cumulative sales curve, obtained as the integral of the sales curve, 
and cumulative sales of VCRs in the US home market, 1980-1989. 


It is easy to fit a curve to past sales data. The importance of the Bass 
curve in marketing is in forecasting, which needs values for the parameters m, 
p, and q. Plausible ranges for the parameter values can be based on published 
data for similar categories of past inventions, and a few examples follow. 


Product m p q Reference 


Typical product - 0.030 0.380 VBM! 
35 mm projectors, 1965-1986 3.37 million 0.009 0.173 Bass? 
Overhead projectors, 1960-1970 | 0.961 million 0.028 0.311 Bass 
PCs, 1981-2010 3.384 billion 0.001 0.195 Bass 


'Value-Based Management; ?Frank M. Bass, 1999. 


Although the forecasts are inevitably uncertain, they are the best informa- 
tion available when making marketing and investment decisions. A prospectus 
for investors or a report to the management team will typically include a set 
of scenarios based on the most likely, optimistic, and pessimistic sets of pa- 
rameters. 

The basic Bass model does not allow for replacement sales and multiple 
purchases. Extensions of the model that allow for replacement sales, multiple 
purchases, and the effects of pricing and advertising in a competitive market 
have been proposed (for example, Mahajan et al. 2000). However, there are 
several reasons why these refinements may be of less interest to investors than 
you might expect. The first is that the profit margin on manufactured goods, 
such as innovative electronics and pharmaceuticals, will drop dramatically 
once patent protection expires and competitors enter the market. A second 
reason is that successful inventions are often superseded by new technology, as 
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VCRs have been by DVD players, and replacement sales are limited. Another 
reason is that many investors are primarily interested in a relatively quick 
return on their money. You are asked to consider Bass models for sales of two 
recent 3G mobile communication devices in Exercise 4. 


3.4 Exponential smoothing & the Holt-Winters method 
3.4.1 Exponential smoothing 


Our objective is to predict some future value t+, given a past history 
[z1,22,..., Zn} of observations up to time n. In this subsection we assume 
there is no systematic trend or seasonal effects in the process, or that these 
have been identified and removed. The mean of the process can change from 
one time step to the next, but we have no information about the likely direction 
of these changes. A typical application is forecasting sales of a well-established 
product in a stable market. The model is 


Ti = pu + wx (3.14) 


where us is the non-stationary mean of the process at time t and wy; are 
independent random deviations with a mean of 0 and a standard deviation ø. 
We will follow the notation in R and let a; be our estimate of u. Given that 
there is no systematic trend, an intuitively reasonable estimate of the mean 
at time t is given by a weighted average of our observation at time t and our 
estimate of the mean at time t — 1: 


à; = at, + (1 — o)ai4 0<a<l (3.15) 


The a; in Equation (3.15) is the exponentially weighted moving average 
(EWMA) at time t. The value of a determines the amount of smoothing, 
and it is referred to as the smoothing parameter. If a is near 1, there is little 
smoothing and a; is approximately x+. This would only be appropriate if the 
changes in the mean level were expected to be large by comparison with c. At 
the other extreme, a value of o near 0 gives highly smoothed estimates of the 
mean level and takes little account of the most recent observation. This would 
only be appropriate if the changes in the mean level were expected to be small 
compared with c. A typical compromise figure for o is 0.2 since in practice 
we usually expect that the change in the mean between time t — 1 and time 
tis likely to be smaller than c. Alternatively, R can provide an estimate for 
o, and we discuss this option below. Since we have assumed that there is no 
systematic trend and that there are no seasonal effects, forecasts made at time 
n for any lead time are just the estimated mean at time n. The forecasting 
equation is 


ngkin = On k-1,2,... (3.16) 
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Equation (3.15), for a;, can be rewritten in two other useful ways. Firstly, 
we can write the sum of a;_, and a proportion of the one-step-ahead forecast 
error, £t — Gt-1, 

Qt = alz — at—1) + d4—31 (3.17) 


Secondly, by repeated back substitution we obtain 


a; = oz, 4- o(1— o)z; 1 +a(1—a)?ay_24... (3.18) 


When written in this form, we see that a; is a linear combination of the current 
and past observations, with more weight given to the more recent observations. 
The restriction 0 < a < 1 ensures that the weights a(1 — a)’ become smaller 
as i increases. Note that these weights form a geometric series, and the sum 
of the infinite series is unity (Exercise 5). We can avoid the infinite regression 
by specifying a, = x1 in Equation (3.15). 

For any given a, the model in Equation (3.17) together with the starting 
value a4 = xı can be used to calculate a, for t = 2,3,...,n. One-step-ahead 
prediction errors, e+, are given by 


€t = Vt — Ti = Tt — At-1 (3.19) 


By default, R obtains a value for the smoothing parameter, a, by minimising 
the sum of squared one-step-ahead prediction errors (S'S1P E): 


SSIPE=) ef = e+e +... +e ay = T1 (3.20) 
t=2 


However, calculating a in this way is not necessarily the best practice. If the 
time series is long and the mean has changed little, the value of a will be 
small. In the specific case where the mean of the process does not change, 
the optimum value for a is L, An exponential smoothing procedure set up 
with a small value of aœ will be slow to respond to any unexpected change 
in the market, as occurred in sales of videotapes, which plummeted after the 
invention of DVDs. 


Complaints to a motoring organisation 


The number of letters of complaint received each month by a motoring organ- 
isation over the four years 1996 to 1999 are available on the website. At the 
beginning of the year 2000, the organisation wishes to estimate the current 
level of complaints and investigate any trend in the level of complaints. We 
should first plot the data, and, even though there are only four years of data, 
we should check for any marked systematic trend or seasonal effects. 


> www <- "http://www.massey.ac.nz/^pscowper/ts/motororg.dat" 
> Motor.dat <- read.table(www, header = T); attach(Motor.dat) 
> Comp.ts <- ts(complaints, start = c(1996, 1), freq = 12) 

> plot(Comp.ts, xlab = "Time / months", ylab = "Complaints") 
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Fig. 3.7. Monthly numbers of letters of complaint received by a motoring organi- 
sation. 


There is no evidence of a systematic trend or seasonal effects, so it seems 
reasonable to use exponential smoothing for this time series. Exponential 
smoothing is a special case of the Holt-Winters algorithm, which we intro- 
duce in the next section, and is implemented in R using the HoltWinters 
function with the additional parameters set to 0. If we do not specify a value 
for a, R will find the value that minimises the one-step-ahead prediction error. 


> Comp.hw1 <- HoltWinters (complaints, beta = 0, gamma = 0) ; Comp.hwi 
> plot (Comp.hw1) 


Holt-Winters exponential smoothing without trend and without seasonal 
component. 


Smoothing parameters: 
alpha: 0.143 

beta : 0 

gamma: 0 


Coefficients: 
[,1] 
a 17.70 


> Comp.hwi$SSE 
[1] 2502 


The estimated value of the mean number of letters of complaint per month 
at the end of 1999 is 17.7. The value of o that gives a minimum SS1PE, of 
2502, is 0.143. We now compare these results with those obtained if we specify 
a value for a of 0.2. 
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> Comp.hw2 <- HoltWinters(complaints, alpha = 0.2, beta=0, gamma=0) 
> Comp.hw2 


Coefficients: 
[5 1] 
a 17.98 


> Comp.hw2$SSE 


[1] 2526 
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Fig. 3.8. Monthly numbers of letters and exponentially weighted moving average. 


The estimated value of the mean number of letters of complaint per month 
at the end of 1999 is now 18.0, and the SS1PE has increased slightly to 2526. 
The advantage of letting R estimate a value for a is that it is optimum for a 
practically important criterion, SS1PE, and that it removes the need to make 
a choice. However, the optimum estimate can be close to 0 if we have a long 
time series over a stable period, and this makes the EWMA unresponsive to 
any future change in mean level. From Figure 3.8, it seems that there was a 
decrease in the number of complaints at the start of the period and a slight rise 
towards the end, although this has not yet affected the exponentially weighted 
moving average. 
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3.4.2 Holt-Winters method 


We usually have more information about the market than exponential smooth- 
ing can take into account. Sales are often seasonal, and we may expect trends 
to be sustained for short periods at least. But trends will change. If we have 
a successful invention, sales will increase initially but then stabilise before de- 
clining as competitors enter the market. We will refer to the change in level 
from one time period to the next as the slope.! Seasonal patterns can also 
change due to vagaries of fashion and variation in climate, for example. The 
Holt-Winters method was suggested by Holt (1957) and Winters (1960), who 
were working in the School of Industrial Administration at Carnegie Institute 
of Technology, and uses exponentially weighted moving averages to update 
estimates of the seasonally adjusted mean (called the level), slope, and sea- 
sonals. 

The Holt-Winters method generalises Equation (3.15), and the additive 
seasonal form of their updating equations for a series (z,) with period p is 


at = o(z« — St-p) + (1 — a) (at—1 + bi) 
b, = Bat = Q4—1) EE (1 a B)bi-1 (3.21) 
8, = Y(T — a4) + (1 — 7) 8t—p 


where at, b4, and s+ are the estimated level,? slope, and seasonal effect at time 
t, and o, B, and y are the smoothing parameters. The first updating equation 
takes a weighted average of our latest observation, with our existing estimate 
of the appropriate seasonal effect subtracted, and our forecast of the level 
made one time step ago. The one-step-ahead forecast of the level is the sum 
of the estimates of the level and slope at the time of forecast. A typical choice 
of the weight o is 0.2. The second equation takes a weighted average of our 
previous estimate and latest estimate of the slope, which is the difference in 
the estimated level at time t and the estimated level at time t — 1. Note that 
the second equation can only be used after the first equation has been applied 
to get a. Finally, we have another estimate of the seasonal effect, from the 
difference between the observation and the estimate of the level, and we take 
a weighted average of this and the last estimate of the seasonal effect for this 
season, which was made at time t — p. Typical choices of the weights @ and y 
are 0.2. The updating equations can be started with a, = xı and initial slope, 
bı, and seasonal effects, 51,..., Sp, reckoned from experience, estimated from 
the data in some way, or set at 0. The default in R is to use values obtained 
from the decompose procedure. 

The forecasting equation for 2,4, made after the observation at time n is 


Sud = An + kb, + Sn+k—p k<p (3.22) 


! When describing the Holt-Winters procedure, the R help and many textbooks 
refer to the slope as the trend. 

? The mean of the process is the sum of the level and the appropriate seasonal 
effect. 
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where anp is the estimated level and b, is the estimated slope, so an + kb,, is the 
expected level at time n+k and Sn+k-p is the exponentially weighted estimate 
of the seasonal effect made at time n = k — p. For example, for monthly data 
(p = 12), if time n + 1 occurs in January, then 5441.1» is the exponentially 
weighted estimate of the seasonal effect for January made in the previous year. 
The forecasting equation can be used for lead times between (m — 1)p+ 1 and 
mp, but then the most recent exponentially weighted estimate of the seasonal 
effect available will be 5, (m, 1)p. 
'The Holt-Winters algorithm with multiplicative seasonals is 


Gn = OQ (=) + (1 — a)(as 1 + b, 1) 
bn = Blan m an—1) + (1 - B)bn-1 (3.23) 
sn = y (23) + (1-9) 8n-p 


The forecasting equation for z,,; made after the observation at time n 
becomes 
Entk|n = (an + kbn)Sn+k-p k<p (3.24) 


In R, the function HoltWinters can be used to estimate smoothing param- 
eters for the Holt-Winters model by minimising the one-step-ahead prediction 
errors (SS1PE). 


Sales of Australian wine 


The data in the file wine .dat are monthly sales of Australian wine by category, 
in thousands of litres, from January 1980 until July 1995. The categories are 
fortified white, dry white, sweet white, red, rose, and sparkling. The sweet 
white wine time series is plotted in Figure 3.9, and there is a dramatic increase 
in sales in the second half of the 1980s followed by a reduction to a level well 
above the starting values. The seasonal variation looks as though it would be 
better modelled as multiplicative, and comparison of the SS1PE for the fitted 
models confirms this (Exercise 6). Here we present results for the model with 
multiplicative seasonals only. The Holt-Winters components and fitted values 
are shown in Figures 3.10 and 3.11 respectively. 


www <- "http://www.massey.ac.nz/~pscowper/ts/wine.dat" 

wine.dat <- read.table(wine, header = T) ; attach (wine.dat) 
sweetw.ts <- ts(sweetw, start = c(1980,1), freq = 12) 
plot(sweetw.ts, xlab= "Time (months)", ylab = "sales (1000 litres)") 
sweetw.hw <- HoltWinters (sweetw.ts, seasonal = "mult") 

sweetw.hw ; sweetw.hw$coef ; sweetw.hw$SSE 
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Smoothing parameters: 
alpha: 0.4107 
beta : 0.0001516 
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gamma: 0.4695 


> sqrt (sweetw.hw$SSE/length (sweetw) ) 
[1] 50.04 

> sd(sweetw) 

[1] 121.4 


> plot (sweetw.hw$fitted) 
> plot (sweetw.hw) 
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Fig. 3.9. Sales of Australian sweet white wine. 


The optimum values for the smoothing parameters, based on minimising 
the one-step ahead prediction errors, are 0.4107, 0.0001516, and 0.4695 for a, 
B, and y, respectively. It follows that the level and seasonal variation adapt 
rapidly whereas the trend is slow to do so. The coefficients are the estimated 
values of the level, slope, and multiplicative seasonals from January to De- 
cember available at the latest time point (t = n = 187), and these are the 
values that will be used for predictions (Exercise 6). Finally, we have calcu- 
lated the mean square one-step-ahead prediction error, which equals 50, and 
have compared it with the standard deviation of the original time series which 
is 121. The decrease is substantial, but a more testing comparison would be 
with the mean one-step-ahead prediction error if we forecast the next month’s 
sales as equal to this month’s sales (Exercise 6). Also, in Exercise 6 you are 
asked to investigate the performance of the Holt-Winters algorithm if the 
three smoothing parameters are all set equal to 0.2 and if the values for the 
parameters are optimised at each time step. 
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Fig. 3.10. Sales of Australian white wine: fitted values; level; slope (labelled trend); 
seasonal variation. 
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Fig. 3.11. Sales of Australian white wine and Holt-Winters fitted values. 


3.4.3 Four-year-ahead forecasts for the air passenger data 


The seasonal effect for the air passenger data of §1.4.1 appeared to increase 
with the trend, which suggests that a ‘multiplicative’ seasonal component be 
used in the Holt-Winters procedure. The Holt-Winters fit is impressive — see 
Figure 3.12. The predict function in R can be used with the fitted model to 
make forecasts into the future (Fig. 3.13). 


> AP.hw <- HoltWinters(AP, seasonal = "mult") 
> plot (AP. hw) 
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> AP.predict <- predict(AP.hw, n.ahead = 4 * 12) 
> ts.plot(AP, AP.predict, lty = 1:2) 
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Fig. 3.13. Holt-Winters forecasts for air passenger data for 1961-1964 shown as 
dotted lines. 


The estimates of the model parameters, which can be obtained from 
AP.hw$alpha, AP.hw$beta, and AP.hw$gamma, are à = 0.274, B = 0.0175, 
and 4 = 0.877. It should be noted that the extrapolated forecasts are based 
entirely on the trends in the period during which the model was fitted and 
would be a sensible prediction assuming these trends continue. Whilst the ex- 
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trapolation in Figure 3.12 looks visually appropriate, unforeseen events could 
lead to completely different future values than those shown here. 


3.5 Summary of commands used in examples 
nls non-linear least squares fit 


HoltWinters estimates the parameters of the Holt-Winters 
or exponential smoothing model 


predict forecasts future values 
ts.union create the union of two series 
coef extracts the coefficients of a fitted model 


3.6 Exercises 


1. a) Describe the association and calculate the ccf between x and y for k 
equal to 1, 10, and 100. 
> w <- 1:100 
> x <- w+ k * rnorm(100) 
> y <- w + k * rnorm(100) 
> ccf(x, y) 
b) Describe the association between x and y, and calculate the ccf. 
> Time «- 1:370 
> x <- sin(2 * pi * Time / 37) 
> y <- sin(2 * pi * (Time + 4) / 37) 
Investigate the effect of adding independent random variation to x 
and y. 


2. a) Let f(t) be the density of time T' to purchase for a randomly selected 
purchaser. Show that 


P(Buys in next time increment ót| no purchase by time t) = h(t)dt 
b) The survivor function S(t) is defined as the complement of the cdf 
S(t) 2 1— F(t) 
Show that S(t) = exp(— f, h(u) du) and E [T] = 7" S(t) dt. 


c) Explain how Equation (3.6) is the discrete form of Equation (3.9). 


3. a) Verify that the solution of Equation (3.8), with h(t) given by Equation 
(3.9), for F(t) is Equation (3.10). 
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b) The logistic distribution has the cdf: F(t) = {1 +exp(—(t— u)/b)) -!, 
with mean p and standard deviation br / v3. Plot the cdf of the logistic 
distribution with a mean 0 and standard deviation 1 against the cdf 
of the standard normal distribution. 

c) Show that the time to peak of the Bass curve is given by Equation 
(3.13). What does this reduce to for the exponential and logistic dis- 
tributions? 


4. The Independent on July 11, 2008 reported the launch of Apple's iPhone. 
A Deutsche Bank analyst predicted Apple would sell 10.5 million units 
during the year. The company was reported to have a target of 10 million 
units worldwide for 2008. Initial demand is predicted to exceed supply. 
Carphone Warehouse reportedly sold their online allocations within hours 
and expect to sell out at most of their UK shops. The report stated that 
there were 60,000 applications for 500 iPhones on the Hutchison Telecom- 
munications website in Hong Kong. 

a) Why is a Bass model without replacement or multiple purchases likely 
to be realistic for this product? 

b) Suggest plausible values for the parameters p, q, and m for the model 
in (a), and give a likely range for these parameters. How does the 
shape of the cumulative sales curve vary with the parameter values? 

c) How could you allow for high initial sales with the Bass model? 


5. a) Write the sum of n terms in a geometric progression with a first term 
a and a common ratio r as 


S$,—a-ar- ar? 4... ar" 1 
Subtract rSn from Sn and rearrange to obtain the formula for the sum 


of n terms: 
S, = a(l — r”) 
l-r 


b) Under what conditions does the sum of n terms of a geometric pro- 
gression tend to a finite sum as n tends to infinity? What is this sum? 

c) Obtain an expression for the sum of the weights in an EWMA if we 
specify aı = xı in Equation (3.15). 

d) Suppose x, happens to be a sequence of independent variables with a 
constant mean and a constant variance o?. What is the variance of a; 
if we specify a4 = xı in Equation (3.15)? 


6. Refer to the sweet white wine sales (83.4.2). 
a) Use the HoltWinters procedure with a, and y set to 0.2 and com- 
pare the SSIPE with the minimum obtained with R. 
b) Use the HoltWinters procedure on the logarithms of sales and com- 
pare SS1PE with that obtained using sales. 
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c) What is the SSIPE if you predict next month's sales will equal this 
month’s sales? 

d) This is rather harder: What is the SSIPE if you find the optimum a, 
B and y from the data available at each time step before making the 
one-step-ahead prediction? 


Continue the following exploratory time series analysis using the global 

temperature series from 81.4.5. 

a) Produce a time plot of the data. Plot the aggregated annual mean 
series and a boxplot that summarises the observed values for each 
season, and comment on the plots. 

b) Decompose the series into the components trend, seasonal effect, and 
residuals, and plot the decomposed series. Produce a plot of the trend 
with a superimposed seasonal effect. 

c) Plot the correlogram of the residuals from question 7b. Comment on 
the plot, explaining any 'significant correlations at significant lags. 

d) Fit an appropriate Holt- Winters model to the monthly data. Explain 
why you chose that particular Holt- Winters model, and give the pa- 
rameter estimates. 

e) Using the fitted model, forecast values for the years 2005-2010. Add 
these forecasts to a time plot of the original series. Under what cir- 
cumstances would these forecasts be valid? What comments of cau- 
tion would you make to an economist or politician who wanted to 
use these forecasts to make statements about the potential impact of 
global warming on the world economy? 


A cumulative sum plot is useful for monitoring changes in the mean of a 
process. If we have a time series composed of observations x; at times t 
with a target value of 7, the CUSUM chart is a plot of the cumulative 
sums of the deviations from target, cs;, against t. The formula for cs; at 


time t is 
t 


CS, = 3c - T) 
i=1 
The R function cumsum calculates a cumulative sum. Plot the CUSUM for 
the motoring organisation complaints with a target of 18. 


Using the motor organisation complaints series, refit the exponential 
smoothing model with weights a = 0.01 and a = 0.99. In each case, 
extract the last residual from the fitted model and verify that the last 
residual satisfies Equation (3.19). Redraw Figure 3.8 using the new values 
of a, and comment on the plots, explaining the main differences. 
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4.1 Purpose 


So far, we have considered two approaches for modelling time series. The 
first is based on an assumption that there is a fixed seasonal pattern about a 
trend. We can estimate the trend by local averaging of the deseasonalised data, 
and this is implemented by the R function decompose. The second approach 
allows the seasonal variation and trend, described in terms of a level and slope, 
to change over time and estimates these features by exponentially weighted 
averages. We used the HoltWinters function to demonstrate this method. 

When we fit mathematical models to time series data, we refer to the dis- 
crepancies between the fitted values, calculated from the model, and the data 
as a residual error series. If our model encapsulates most of the deterministic 
features of the time series, our residual error series should appear to be a re- 
alisation of independent random variables from some probability distribution. 
However, we often find that there is some structure in the residual error series, 
such as consecutive errors being positively correlated, which we can use to im- 
prove our forecasts and make our simulations more realistic. We assume that 
our residual error series is stationary, and in Chapter 6 we introduce models 
for stationary time series. 

Since we judge a model to be a good fit if its residual error series appears 
to be a realisation of independent random variables, it seems natural to build 
models up from a model of independent random variation, known as discrete 
white noise. The name ‘white noise’ was coined in an article on heat radiation 
published in Nature in April 1922, where it was used to refer to series that 
contained all frequencies in equal proportions, analogous to white light. The 
term purely random is sometimes used for white noise series. In §4.3 we define a 
fundamental non-stationary model based on discrete white noise that is called 
the random walk. It is sometimes an adequate model for financial series and is 
often used as a standard against which the performance of more complicated 
models can be assessed. 


P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 67 
Use R, DOI 10.1007 /978-0-387-88698-5_4, 
© Springer Science+Business Media, LLC 2009 
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4.2 White noise 


4.2.1 Introduction 


A residual error is the difference between the observed value and the model 
predicted value at time t. If we suppose the model is defined for the variable 
y; and % is the value predicted by the model, the residual error x; is 


Tt = yt — Ut (4.1) 


As the residual errors occur in time, they form a time series: z1,22,..., En- 
In Chapter 2, we found that features of the historical series, such as the 
trend or seasonal variation, are reflected in the correlogram. Thus, if a model 
has accounted for all the serial correlation in the data, the residual series would 
be serially uncorrelated, so that a correlogram of the residual series would 
exhibit no obvious patterns. This ideal motivates the following definition. 


4.2.2 Definition 


A time series (w, : t = 1,2,...,n] is discrete white noise (DWN) if the 
variables w1, W2,..., Wn are independent and identically distributed with a 
mean of zero. This implies that the variables all have the same variance o? 
and Cor(w;,w;) = 0 for all i Æ j. If, in addition, the variables also follow a 
normal distribution (i.e., w ~ N(0,07)) the series is called Gaussian white 
noise. 


4.2.3 Simulation in R 


A fitted time series model can be used to simulate data. Time series simulated 
using a model are sometimes called synthetic series to distinguish them from 
an observed historical series. 

Simulation is useful for many reasons. For example, simulation can be used 
to generate plausible future scenarios and to construct confidence intervals for 
model parameters (sometimes called bootstrapping). In R, simulation is usu- 
ally straightforward, and most standard statistical distributions are simulated 
using a function that has an abbreviated name for the distribution prefixed 
with an ‘r’ (for ‘random’).' For example, rnorm(100) is used to simulate 100 
independent standard normal variables, which is equivalent to simulating a 
Gaussian white noise series of length 100 (Fig. 4.1). 


> set.seed(1) 
> w <- rnorm(100) 
> plot(w, type = "1") 


! Other prefixes are also available to calculate properties for standard distributions; 
e.g., the prefix ‘d’ is used to calculate the probability (density) function. See the 
R help (e.g., ?dnorm) for more details. 
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Fig. 4.1. Time plot of simulated Gaussian white noise series. 


Simulation experiments in R can easily be repeated using the ‘up’ arrow 
on the keyboard. For this reason, it is sometimes preferable to put all the 
commands on one line, separated by ‘;’, or to nest the functions; for example, 
a plot of a white noise series is given by plot(rnorm(100), type="1"). 

The function set.seed is used to provide a starting point (or seed) in 
the simulations, thus ensuring that the simulations can be reproduced. If this 
function is left out, a different set of simulated data are obtained, although 
the underlying statistical properties remain unchanged. To see this, rerun the 
plot above a few times with and without set.seed(1). 

'To illustrate by simulation how samples may differ from their underlying 
populations, consider the following histogram of a Gaussian white noise series. 
Type the following to view the plot (which is not shown in the text): 


> x <- seq(-3,3, length = 1000) 
> hist(rnorm(100), prob = T); points(x, dnorm(x), type = "1") 


Repetitions of the last command, which can be obtained using the ‘up’ arrow 
on your keyboard, will show a range of different sample distributions that 
arise when the underlying distribution is normal. Distributions that depart 
from the plotted curve have arisen due to sampling variation. 


4.2.4 Second-order properties and the correlogram 


The second-order properties of a white noise series {w+} are an immediate 
consequence of the definition in §4.2.2. However, as they are needed so often 
in the derivation of the second-order properties for more complex models, we 
explicitly state them here: 
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Hw = 0 
r o if k=0 (4.2) 
Ye = Cov(wes wS 9 ip a £0 


The autocorrelation function follows as 


fa if k-0 (25 
PEG abet 


Simulated white noise data will not have autocorrelations that are exactly 
zero (when k # 0) because of sampling variation. In particular, for a simu- 
lated white noise series, it is expected that 5% of the autocorrelations will 
be significantly different from zero at the 5% significance level, shown as dot- 
ted lines on the correlogram. Try repeating the following command to view a 
range of correlograms that could arise from an underlying white noise series. 
A typical plot, with one statistically significant autocorrelation, occurring at 
lag 7, is shown in Figure 4.2. 


> set.seed(2) 
> acf(rnorm(100)) 
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Fig. 4.2. Correlogram of a simulated white noise series. The underlying autocorre- 
lations are all zero (except at lag 0); the statistically significant value at lag 7 is due 
to sampling variation. 


4.2.5 Fitting a white noise model 


A white noise series usually arises as a residual series after fitting an appropri- 
ate time series model. The correlogram generally provides sufficient evidence, 
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provided the series is of a reasonable length, to support the conjecture that 
the residuals are well approximated by white noise. 

The only parameter for a white noise series is the variance c?, which is 
estimated by the residual variance, adjusted by degrees of freedom, given in 
the computer output of the fitted model. If your analysis begins on data that 
are already approximately white noise, then only o? needs to be estimated, 
which is readily achieved using the var function. 


4.3 Random walks 
4.3.1 Introduction 


In Chapter 1, the exchange rate data were examined and found to exhibit 
stochastic trends. A random walk often provides a good fit to data with 
stochastic trends, although even better fits are usually obtained from more 
general model formulations, such as the ARIMA models of Chapter 7. 
4.3.2 Definition 
Let {x+} be a time series. Then {x+} is a random walk if 

Lt Xq51-d- Wt (4.4) 


where {w;} is a white noise series. Substituting z;..; = z..3--w;..; in Equation 
(4.4) and then substituting for x;~2, followed by z;.3 and so on (a process 
known as ‘back substitution’) gives: 


XQ = We + We-1 + We-o +... (4.5) 


In practice, the series above will not be infinite but will start at some time 
t = 1. Hence, 
X4 = W1 + Wot... + WE (4.6) 


Back substitution is used to define more complex time series models and 
also to derive second-order properties. The procedure occurs so frequently in 
the study of time series models that the following definition is needed. 


4.3.3 The backward shift operator 
The backward shift operator B is defined by 
Br; = X1—1 (4.7) 


The backward shift operator is sometimes called the ‘lag operator’. By repeat- 
edly applying B, it follows that 


B"z, = Tt-n (4.8) 
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Using B, Equation (4.4) can be rewritten as 
Tt = Bzi + Ww => (1 — B)a; = Wt => % = (1 = B) tw 


> t; = (1 +B +B? +.. Jw: > z; = wj + wii wii... 


and Equation (4.5) is recovered. 


4.3.4 Random walk: Second-order properties 


The second-order properties of a random walk follow as 


Mgr | (4.9) 


^n (t) = Cov(zi, Ttk) = to? 


The covariance is a function of time, so the process is non-stationary. In par- 
ticular, the variance is to? and so it increases without limit as t increases. It 
follows that a random walk is only suitable for short term predictions. 


The time-varying autocorrelation function for k > 0 follows from Equation 
(4.9) as 


T Cov(zi, zik) 2» to? = 1 
vNar(z,)Var(ri x)  vtolt+k)o? \/1+k/t 


so that, for large t with k considerably less than t, pẹ is nearly 1. Hence, the 
correlogram for a random walk is characterised by positive autocorrelations 
that decay very slowly down from unity. This is demonstrated by simulation 
in 84.3.7. 


px (t) (4.10) 


4.3.5 Derivation of second-order properties* 


Equation (4.6) is a finite sum of white noise terms, each with zero mean and 
variance o?. Hence, the mean of z, is zero (Equation (4.9)). The autocovari- 
ance in Equation (4.9) can be derived using Equation (2.15) as follows: 


t tk 
^ (t) = Cov(zi, tin) = Cov 5 Wi, 5 wj | = 5 Cov(w;, w;) = to? 
i=1 j=l i=j 


4.3.6 The difference operator 


Differencing adjacent terms of a series can transform a non-stationary series 
to a stationary series. For example, if the series (z;] is a random walk, it 
is non-stationary. However, from Equation (4.4), the first-order differences of 
{x;} produce the stationary white noise series {w;} given by x, — £t-1 = wi. 
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Hence, differencing turns out to be a useful ‘filtering’ procedure in the study 
of non-stationary time series. The difference operator V is defined by 


Va = Tt — X4—1 (4.11) 


Note that Vaz, = (1 — B)zi, so that V can be expressed in terms of the back- 
ward shift operator B. In general, higher-order differencing can be expressed 
as 

V" = (1- B)” (4.12) 


The proof of the last result is left to Exercise 7. 


4.3.7 Simulation 


It is often helpful to study a time series model by simulation. This enables the 
main features of the model to be observed in plots, so that when historical data 
exhibit similar features, the model may be selected as a potential candidate. 
The following commands can be used to simulate random walk data for x: 


> x <- w <- rnorm(1000) 
> for (t in 2:1000) x[t] <- x[t - 1] + wIt] 
> plot(x, type = "1") 


The first command above places a white noise series into w and uses this 
series to initialise x. The ‘for’ loop then generates the random walk using 
Equation (4.4) — the correspondence between the R code above and Equation 
(4.4) should be noted. The series is plotted and shown in Figure 4.3.? 

A correlogram of the series is obtained from acf (x) and is shown in Fig- 
ure 4.4 — a gradual decay in the correlations is evident in the figure, thus 
supporting the theoretical results in 84.3.4. 

Throughout this book, we will often fit models to data that we have simu- 
lated and attempt to recover the underlying model parameters. At first sight, 
this might seem odd, given that the parameters are used to simulate the data 
so that we already know at the outset the values the parameters should take. 
However, the procedure is useful for a number of reasons. In particular, to 
be able to simulate data using a model requires that the model formulation 
be correctly understood. If the model is understood but incorrectly imple- 
mented, then the parameter estimates from the fitted model may deviate 
significantly from the underlying model values used in the simulation. Simu- 
lation can therefore help ensure that the model is both correctly understood 
and correctly implemented. 


? To obtain the same simulation and plot, it is necessary to have run the previous 
code in 84.2.4 first, which sets the random number seed. 
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Fig. 4.3. Time plot of a simulated random walk. The series exhibits an increasing 
trend. However, this is purely stochastic and due to the high serial correlation. 
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Fig. 4.4. The correlogram for the simulated random walk. A gradual decay from a 
high serial correlation is a notable feature of a random walk series. 


4.4 Fitted models and diagnostic plots 


4.4.1 Simulated random walk series 


The first-order differences of a random walk are a white noise series, so the 
correlogram of the series of differences can be used to assess whether a given 
series is reasonably modelled as a random walk. 


> acf(diff(x)) 
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As can be seen in Figure 4.5, there are no obvious patterns in the correlogram, 
with only a couple of marginally statistically significant values. These signif- 
icant values can be ignored because they are small in magnitude and about 
5% of the values are expected to be statistically significant even when the 
underlying values are zero (§2.3). Thus, as expected, there is good evidence 
that the simulated series in x follows a random walk. 
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Fig. 4.5. Correlogram of differenced series. If a series follows a random walk, the 
differenced series will be white noise. 


4.4.2 Exchange rate series 


The correlogram of the first-order differences of the exchange rate data from 
81.4.4 can be obtained from acf (diff(Z.ts)) and is shown in Figure 4.6. 

A significant value occurs at lag 1, suggesting that a more complex model 
may be needed, although the lack of any other significant values in the cor- 
relogram does suggest that the random walk provides a good approximation 
for the series (Fig. 4.6). An additional term can be added to the random 
walk model using the Holt- Winters procedure, allowing the parameter 6 to 
be non-zero but still forcing the seasonal term y to be zero: 


> Z.hw <- HoltWinters(Z.ts, alpha = 1, gamma = 0) 
> acf (resid(Z.hw)) 


Figure 4.7 shows the correlogram of the residuals from the fitted Holt- 
Winters model. This correlogram is more consistent with a hypothesis that 
the residual series is white noise (Fig. 4.7). Using Equation (3.21), with the 
parameter estimates obtained from Z.hw$alpha and Z.hw$beta, the fitted 
model can be expressed as 
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Fig. 4.6. Correlogram of first-order differences of the exchange rate series (UK 
pounds to NZ dollars, 1991-2000). The significant value at lag 1 indicates that an 
extension of the random walk model is needed for this series. 
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Fig. 4.7. The correlogram of the residuals from the fitted Holt-Winters model for the 
exchange rate series (UK pounds to NZ dollars, 1991-2000). There are no significant 
correlations in the residual series, so the model provides a reasonable approximation 
to the exchange rate data. 


(4.13) 


Le = Tt—1 + be-i + We 
b, ., = 0.167 (£4—1 — 21.3) + 0.8335, 9 


where {w+} is white noise with zero mean. 
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After some algebra, Equations (4.13) can be expressed as one equation 
in terms of the backward shift operator: 


(1 — 0.167B + 0.167B?)(1 — B)a; = w: (4.14) 


Equation (4.14) is a special case — the integrated autoregressive model — 
within the important class of models known as ARIMA models (Chap- 
ter 7). The proof of Equation (4.14) is left to Exercise 8. 


4.4.3 Random walk with drift 


Company stockholders generally expect their investment to increase in value 
despite the volatility of financial markets. The random walk model can be 
adapted to allow for this by including a drift parameter ô. 


£t = Lt—1 +Ô + w 


Closing prices (US dollars) for Hewlett-Packard Company stock for 672 
trading days up to June 7, 2007 are read into R and plotted (see the code 
below and Fig. 4.8). The lag 1 differences are calculated using diff () and 
plotted in Figure 4.9. The correlogram of the differences is in Figure 4.10, and 
they appear to be well modelled as white noise. The mean of the differences is 
0.0399, and this is our estimate of the drift parameter. The standard deviation 
of the 671 differences is 0.460, and an approximate 95% confidence interval 
for the drift parameter is [0.004, 0.075]. Since this interval does not include 0, 
we have evidence of a positive drift over this period. 
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Fig. 4.8. Daily closing prices of Hewlett-Packard stock. 
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Fig. 4.9. Lag 1 differences of daily closing prices of Hewlett-Packard stock. 
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Fig. 4.10. Acf of lag 1 differences of daily closing prices of Hewlett-Packard stock. 


www <- "http://www.massey.ac.nz/^pscowper/ts/HP.txt" 
HP.dat <- read.table(www, header = T) ; attach(HP.dat) 
plot (as.ts(Price)) 

DP «- diff(Price) ; plot (as.ts(DP)) ; acf (DP) 


V VM M 


> mean(DP) + c(-2, 2) * sd(DP)/sqrt (length(DP)) 


[1] 0.004378 0.075353 
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4.5 Autoregressive models 


4.5.1 Definition 


The series {2;} is an autoregressive process of order p, abbreviated to AR(p), 
if 
Li = O44 4 + (934 2 +... Orge og + W (4.15) 


where {w+} is white noise and the a; are the model parameters with a, # 0 
for an order p process. Equation (4.15) can be expressed as a polynomial of 
order p in terms of the backward shift operator: 


6,(B)z, = (1 o1 B a2B? € QpB?) x; = Wt (4.16) 


The following points should be noted: 
(a) The random walk is the special case AR(1) with a; = 1 (see Equation 


(4.4)). 

(b) The exponential smoothing model is the special case a; = a(1 — a)? for 
i = 1,2,... and p — oo. 

(c) The model is a regression of z, on past terms from the same series; hence 
the use of the term ‘autoregressive’. 

(d) A prediction at time t is given by 


T4 = Q424,.1 + Oo2t 9 +... + ApLt—p (4.17) 


(e) The model parameters can be estimated by minimising the sum of squared 
errors. 


4.5.2 Stationary and non-stationary AR processes 


The equation @,(B) = 0, where B is formally treated as a number (real or 
complex), is called the characteristic equation. The roots of the characteristic 
equation (i.e., the polynomial 6,(B) from Equation (4.16)) must all exceed 
unity in absolute value for the process to be stationary. Notice that the random 
walk has 0 = 1 — B with root B = 1 and is non-stationary. The following four 
examples illustrate the procedure for determining whether an AR process is 
stationary or non-stationary: 

1. The AR(1) model z, — ir + w; is stationary because the root of 
1— ;B =0 is B = 2, which is greater than 1. 

2. The AR(2) model z; = z;.1 — irio + w; is stationary. The proof of this 
result is obtained by first expressing the model in terms of the backward 
shift operator +(B? — 4B + 4)z; = ws i.e., }(B — 2)?z, = wi. The roots 
of the polynomial are given by solving 0(B) = +(B — 2)? = 0 and are 
therefore obtained as B = 2. As the roots are greater than unity this 
AR(2) model is stationary. 
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3. The model x; = 1214 + itio + w, is non-stationary because one of 
the roots is unity. To prove this, first express the model in terms of the 
backward shift operator — 3(B?-B —2)z, = wi; i.e., —}(B—1)(B+2)2; = 
w. The polynomial 6(B) = —3(B — 1)(B + 2) has roots B = 1,—2. As 
there is a unit root (B — 1), the model is non-stationary. Note that the 
other root (B — —2) exceeds unity in absolute value, so only the presence 
of the unit root makes this process non-stationary. 

4. The AR(2) model x, = —iz,.3 + ws is stationary because the roots of 
1 + iB? = 0 are B = +2i, which are complex numbers with i = y—1, 
each having an absolute value of 2 exceeding unity. 


'The R function polyroot finds zeros of polynomials and can be used to find 
the roots of the characteristic equation to check for stationarity. 
4.5.3 Second-order properties of an AR(1) model 
From Equation (4.15), the AR(1) process is given by 
Lt = Om + wt (4.18) 


where {w;} is a white noise series with mean zero and variance o°. It can be 
shown (84.5.4) that the second-order properties follow as 


m | (4.19) 


yk = ao? /(1 — o?) 


4.5.4 Derivation of second-order properties for an AR(1) process* 
Using B, a stable AR(1) process (|a| < 1) can be written as 
(1 — aB)z, = wi 
=> z,—(1-aB) !u, (4.20) 
= wd aw +w ++... = 320 ow, i 


Hence, the mean is given by 


oo oo 
E (zi) =F (>: 2 Cus 5 œE (wii) =0 
i=0 i=0 
and the autocovariance follows as 
Yk = Cov (24,444) = Cov ea o wii, 2:58 o We i) 
= aa oia Cov (wei, Ut k— 5) 


=o y fua = a0? /(1 — a°) 


using Equations (2.15) and (4.2). 
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4.5.5 Correlogram of an AR(1) process 
From Equation (4.19), the autocorrelation function follows as 
p-o*  (k20) (4.21) 


where |a| < 1. Thus, the correlogram decays to zero more rapidly for small a. 
The following example gives two correlograms for positive and negative values 
of a, respectively (Fig. 4.11): 


> rho <- function(k, alpha) alpha^k 

> layout (1:2) 

> plot(0:10, rho(0:10, 0.7), type = "b") 
> plot(0:10, rho(0:10, -0.7), type = "b") 


Try experimenting using other values for a. For example, use a small value of 
a to observe a more rapid decay to zero in the correlogram. 


4.5.6 Partial autocorrelation 


From Equation (4.21), the autocorrelations are non-zero for all lags even 
though in the underlying model x; only depends on the previous value x4_1 
(Equation (4.18)). The partial autocorrelation at lag k is the correlation that 
results after removing the effect of any correlations due to the terms at shorter 
lags. For example, the partial autocorrelation of an AR(1) process will be zero 
for all lags greater than 1. In general, the partial autocorrelation at lag k is 
the kth coefficient of a fitted AR(k) model; if the underlying process is AR(p), 
then the coefficients a, will be zero for all k > p. Thus, an AR(p) process has 
a correlogram of partial autocorrelations that is zero after lag p. Hence, a plot 
of the estimated partial autocorrelations can be useful when determining the 
order of a suitable AR process for a time series. In R, the function pacf can 
be used to calculate the partial autocorrelations of a time series and produce 
a plot of the partial autocorrelations against lag (the ‘partial correlogram’). 


4.5.7 Simulation 


An AR(1) process can be simulated in R as follows: 


> set.seed(1) 

> x <- w <- rnorm(100) 

> for (t in 2:100) x[t] <- 0.7 * x[t - 1] + w[t] 
> plot(x, type = "1") 

> acf (x) 

> pacf (x) 


The resulting plots of the simulated data are shown in Figure 4.12 and give one 
possible realisation of the model. The partial correlogram has no significant 
correlations except the value at lag 1, as expected (Fig. 4.12c — note that the 
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Fig. 4.11. Example correlograms for two autoregressive models: (a) x: = 0.7r4-1 + 
we; (b) zi; = —O0.7x4-1 + ws. 


pacf starts at lag 1, whilst the acf starts at lag 0). The difference between the 
correlogram of the underlying model (Fig. 4.11a) and the sample correlogram 
of the simulated series (Fig. 4.12b) shows discrepancies that have arisen due 
to sampling variation. Try repeating the commands above several times to 
obtain a range of possible sample correlograms for an AR(1) process with 
underlying parameter a = 0.7. You are asked to investigate an AR(2) process 
in Exercise 4. 


4.6 Fitted models 


4.6.1 Model fitted to simulated series 


An AR(p) model can be fitted to data in R using the ar function. In the code 
below, the autoregressive model x.ar is fitted to the simulated series of the 
last section and an approximate 9596 confidence interval for the underlying 
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Fig. 4.12. A simulated AR(1) process, x; = 0.7241 + wz. Note that in the partial 
correlogram (c) only the first lag is significant, which is usually the case when the 
underlying process is AR(1). 


parameter is given, where the (asymptotic) variance of the parameter estimate 
is extracted using x.ar$asy.var: 


> x.ar <- ar(x, method = "mle") 
> x.ar$order 


[1] 1 


> x.ar$ar 
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[1] 0.601 
> x.ar$ar + c(-2, 2) * sqrt(x.ar$asy.var) 
[1] 0.4404 0.7615 


The method “mle” used in the fitting procedure above is based on max- 
imising the likelihood function (the probability of obtaining the data given the 
model) with respect to the unknown parameters. The order p of the process 
is chosen using the Akaike Information Criterion (AIC; Akaike, 1974), which 
penalises models with too many parameters: 


AIC = —2 x log-likelihood + 2 x number of parameters (4.22) 


In the function ar, the model with the smallest AIC is selected as the best- 
fitting AR model. Note that, in the code above, the correct order (p = 1) 
of the underlying process is recovered. The parameter estimate for the fitted 
AR(1) model is & = 0.60. Whilst this is smaller than the underlying model 
value of a = 0.7, the approximate 95% confidence interval does contain the 
value of the model parameter as expected, giving us no reason to doubt the 
implementation of the model. 


4.6.2 Exchange rate series: Fitted AR model 


An AR(1) model is fitted to the exchange rate series, and the upper bound 
of the confidence interval for the parameter includes 1. This indicates that 
there would not be sufficient evidence to reject the hypothesis a = 1, which is 
consistent with the earlier conclusion that a random walk provides a good ap- 
proximation for this series. However, simulated data from models with values 
of a > 1, formally included in the confidence interval below, exhibit exponen- 
tially unstable behaviour and are not credible models for the New Zealand 
exchange rate. 


> Z.ar <- ar(Z.ts) 
> mean(Z.ts) 


[1] 2.823 
> Z.ar$order 


[1] 1 
> Z.ar$ar 


[1] 0.8903 


> Z.ar$ar + c(-2, 2) * sqrt(Z.ar$asy.var) 


[1] 0.7405 1.0400 


> acf(Z.ar$res[-1]) 
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In the code above, a “—1” is used in the vector of residuals to remove the 
first item from the residual series (Fig. 4.13). (For a fitted AR(1) model, the 
first item has no predicted value because there is no observation at t = 0; in 
general, the first p values will be ‘not available’ (NA) in the residual series of 
a fitted AR(p) model.) 

By default, the mean is subtracted before the parameters are estimated, 
so a predicted value 2, at time t based on the output above is given by 


2, = 2.8 + 0.89(z, 1 — 2.8) (4.23) 
Q 
«o | 
eo 
LL. =| 
O P TA E AI P L wee a EE 
* ^w | 
eo 
L| || 
x E 
cu 
I [i Seat a See E EE AE EE E EE EE E A E E EE 
l 
0 5 10 15 
Lag 


Fig. 4.13. The correlogram of residual series for the AR(1) model fitted to the 
exchange rate data. 


4.6.3 Global temperature series: Fitted AR model 


The global temperature series was introduced in §1.4.5, where it was apparent 
that the data exhibited an increasing trend after 1970, which may be due to 
the ‘greenhouse effect’. Sceptics may claim that the apparent increasing trend 
can be dismissed as a transient stochastic phenomenon. For their claim to be 
consistent with the time series data, it should be possible to model the trend 
without the use of deterministic functions. 
Consider the following AR model fitted to the mean annual temperature 

series: 

> www = "http://www.massey.ac.nz/~pscowper/ts/global.dat" 

> Global = scan(www) 

> Global.ts = ts(Global, st = c(1856, 1), end = c(2005, 12), 

fr = 12) 
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> Global.ar <- ar(aggregate(Global.ts, FUN = mean), method = "mle") 
> mean(aggregate(Global.ts, FUN = mean)) 

[1] -0.1383 

> Global.ar$order 


1] 4 


> Global.ar$ar 


[1] 0.58762 0.01260 0.11117 0.26764 


> acf(Global.ar$res[-(1:Global.ar$order)], lag = 50) 
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Fig. 4.14. The correlogram of the residual series for the AR(4) model fitted to the 
annual global temperature series. The correlogram is approximately white noise so 
that, in the absence of further information, a simple stochastic model can ‘explain’ 
the correlation and trends in the series. 


Based on the output above a predicted mean annual temperature f, at 
time t is given by 


4, = —0.14 + 0.59(a4_1 + 0.14) + 0.013(£4—2 + 0.14) 


-F0.11(z,. 3 + 0.14) + 0.27 (4.4 + 0.14) (4.24) 


The correlogram of the residuals has only one (marginally) significant value 
at lag 27, so the underlying residual series could be white noise (Fig. 4.14). 
Thus the fitted AR(4) model (Equation (4.24)) provides a good fit to the 
data. As the AR model has no deterministic trend component, the trends in 
the data can be explained by serial correlation and random variation, implying 
that it is possible that these trends are stochastic (or could arise from a purely 
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stochastic process). Again we emphasise that this does not imply that there is 
no underlying reason for the trends. If a valid scientific explanation is known, 
such as a link with the increased use of fossil fuels, then this information would 
clearly need to be included in any future forecasts of the series. 


4.7 Summary of R commands 


set.seed sets a seed for the random number generator 
enabling a simulation to be reproduced 


rnorm simulates Gaussian white noise series 
diff creates a series of first-order differences 
ar gets the best fitting AR(p) model 

pacf extracts partial autocorrelations 


and partial correlogram 
polyroot extracts the roots of a polynomial 
resid extracts the residuals from a fitted model 


4.8 Exercises 


1. Simulate discrete white noise from an exponential distribution and plot the 
histogram and the correlogram. For example, you can use the R command 
w <- rexp(1000)-1 for exponential white noise. Comment on the plots. 


2. a) Simulate time series of length 100 from an AR(1) model with a equal 
to —0.9, —0.5, 0.5, and 0.9. Estimate the parameter of each model and 
make predictions for 1 to 10 steps ahead. 

b) Simulate time series of length 100 from an AR(1) model with a equal 
to 1.01, 1.02, and 1.05. Estimate the parameters of these models. 


3. An AR(1) model with a non-zero mean p can be expressed by either 
Le — U = o(2z;4 — H) + Wi or £i = ag + 0424 4 + wi. 
a) What is the relationship between the parameters u and o and the 
parameters o and o4? 
b) Deduce a similar relationship for an AR(2) process with mean pu. 


4. a) Simulate a time series of length 1000 for the following model, giving 
appropriate R code and placing the simulated data in a vector x: 
5 1 
Tt = 6-1 = 67-2 + Wt (4.25) 
b) Plot the correlogram and partial correlogram for the simulated data. 
Comment on the plots. 
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c) 


d) 
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Fit an AR model to the data in x giving the parameter estimates and 
order of the fitted AR process. 

Construct 9596 confidence intervals for the parameter estimates of 
the fitted model. Do the model parameters fall within the confidence 
intervals? Explain your results. 

Is the model in Equation (4.25) stationary or non-stationary? Justify 
your answer. 

Plot the correlogram of the residuals of the fitted model, and comment 
on the plot. 


Show that the series {x;} given by 2, = Šta — 121.2 + w is non- 
stationary. 

Write down the model for {y+}, where y, = Vz. Show that {y+} is 
stationary. 

Simulate a series of 1000 values for {x+}, placing the simulated data 
in x, and use these simulated values to produce a series of 999 values 
for {yz}, placing this series in the vector y. 

Fit an AR model to y. Give the fitted model parameter estimates and a 
95% confidence interval for the underlying model parameters based on 
these estimates. Compare the confidence intervals to the parameters 
used to simulate the data and explain the results. 

Plot the correlogram of the residuals of the fitted model and comment. 


Refit the AR(4) model of 84.6.3 to the annual mean global temperature 
series, and using the fitted model create a series of predicted values 
from t — 2 to the last value in the series (using Equation (4.24)). 
Create a residual series from the difference between the predicted value 
and the observed value, and verify that within machine accuracy your 
residual series is identical to the series extracted from the fitted model 
in R. 

Plot a correlogram and partial correlogram for the mean annual tem- 
perature series. Comment on the plots. 

Use the predict function in R to forecast 100 years of future values 
for the annual global temperature series using the fitted AR(4) model 
(Equation (4.24)) of 84.6.3. 

Create a time plot of the mean annual temperature series and add the 
100-year forecasts to the plot (use a different colour or symbol for the 
forecasts). 

Add a line representing the overall mean global temperature. Com- 
ment on the final plot and any potential inadequacies in the fitted 
model. 


Prove Equation (4.12) by mathematical induction as follows. (i) First, 
show that if Equation (4.12) holds for n — k, then it also holds for n — 
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k +1. (ii) Next, show that Equation (4.12) holds for the case n = 2 and 
hence (from i) holds for all n. 


8. Prove Equation (4.14). [Hint: Express the two equations in (4.13) in terms 
of the backward shift operator and then substitute for bn.] 
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Regression 


5.1 Purpose 


Trends in time series can be classified as stochastic or deterministic. We may 
consider a trend to be stochastic when it shows inexplicable changes in di- 
rection, and we attribute apparent transient trends to high serial correlation 
with random error. Trends of this type, which are common in financial series, 
can be simulated in R using models such as the random walk or autoregressive 
process (Chapter 4). In contrast, when we have some plausible physical ex- 
planation for a trend we will usually wish to model it in some deterministic 
manner. For example, a deterministic increasing trend in the data may be 
related to an increasing population, or a regular cycle may be related to a 
known seasonal frequency. Deterministic trends and seasonal variation can be 
modelled using regression. 

The practical difference between stochastic and deterministic trends is 
that we extrapolate the latter when we make forecasts. We justify short-term 
extrapolation by claiming that underlying trends will usually change slowly 
in comparison with the forecast lead time. For the same reason, short-term 
extrapolation should be based on a line, maybe fitted to the more recent data 
only, rather than a high-order polynomial. 

In this chapter various regression models are studied that are suitable for 
a time series analysis of data that contain deterministic trends and regular 
seasonal changes. We begin by looking at linear models for trends and then 
introduce regression models that account for seasonal variation using indica- 
tor and harmonic variables. Regression models can also include explanatory 
variables. The logarithmic transformation, which is often used to stabilise the 
variance, is also considered. 

Time series regression usually differs from a standard regression analysis 
because the residuals form a time series and therefore tend to be serially cor- 
related. When this correlation is positive, the estimated standard errors of 
the parameter estimates, read from the computer output of a standard re- 
gression analysis, will tend to be less than their true value. This will lead 
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to erroneously high statistical significance being attributed to statistical tests 
in standard computer output (the p values will be smaller than they should 
be). Presenting correct statistical evidence is important. For example, an en- 
vironmental protection group could be undermined by allegations that it is 
falsely claiming statistically significant trends. In this chapter, generalised 
least squares is used to obtain improved estimates of the standard error to 
account for autocorrelation in the residual series. 


5.2 Linear models 


5.2.1 Definition 


A model for a time series {x+ : t = 1,...n] is linear if it can be expressed as 


Lt = Qo + 031, + AQUat +... d- AmUm,t + 2 (5.1) 


where u;; is the value of the ith predictor (or explanatory) variable at time 
t (i =1,...,m;t = 1,...,n), z is the error at time t, and ao, Q@1,...,Qm 
are model parameters, which can be estimated by least squares. Note that the 
errors form a time series {z+}, with mean 0, that does not have to be Gaussian 
or white noise. Àn example of a linear model is the pth-order polynomial 
function of t: 

£i = ag + ait + aat? ... + pl? + zi (5.2) 


The predictor variables can be written uj, = t (i = 1,...,p). The term 
‘linear’ is a reference to the summation of model parameters, each multiplied 
by a single predictor variable. 

A simple special case of a linear model is the straight-line model obtained 
by putting p = 1 in Equation (5.2): x, = ao + ait + z+. In this case, the value 
of the line at time t is the trend m+. For the more general polynomial, the 
trend at time t is the value of the underlying polynomial evaluated at t, so in 
Equation (5.2) the trend is m; = ao + ait + azt? .. . + apt”. 

Many non-linear models can be transformed to linear models. For example, 
the model zr, = e?»*?:t*7 for the series {x+} can be transformed by taking 
natural logarithms to obtain a linear model for the series {y+}: 


y: = log z; = ag + out z (5.3) 


In Equation (5.3), standard least squares regression could then be used to fit 
a linear model (i.e., estimate the parameters ap and o1) and make predictions 
for y+. To make predictions for x+, the inverse transform needs to be applied 
to ys, which in this example is exp(y;). However, this usually has the effect 
of biasing the forecasts of mean values, and we discuss correction factors in 
85.10. 

Natural processes that generate time series are not expected to be precisely 
linear, but linear approximations are often adequate. However, we are not 
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restricted to linear models, and the Bass model (§3.3) is an example of a non- 
linear model, which we fitted using the non-linear least squares function nls. 


5.2.2 Stationarity 


Linear models for time series are non-stationary when they include functions 
of time. Differencing can often transform a non-stationary series with a de- 
terministic trend to a stationary series. For example, if the time series {£+} is 
given by the straight-line function plus white noise x, = ag + ait + z+, then 
the first-order differences are given by 


Vti = £i — UM -1 = Z2; — Zgj 1-03 (5.4) 


Assuming the error series {z;} is stationary, the series (Vx) is stationary 
as it is not a function of t. In 84.3.6 we found that first-order differencing 
can transform a non-stationary series with a stochastic trend (the random 
walk) to a stationary series. Thus, differencing can remove both stochastic and 
deterministic trends from time series. If the underlying trend is a polynomial 
of order m, then mth-order differencing is required to remove the trend. 

Notice that differencing the straight-line function plus white noise leads to 
a different stationary time series than subtracting the trend. The latter gives 
white noise, whereas differencing gives a series of consecutive white noise terms 
(which is an example of an MA process, described in Chapter 6). 


5.2.3 Simulation 


In time series regression, it is common for the error series {z+} in Equation 
(5.1) to be autocorrelated. In the code below a time series with an increas- 
ing straight-line trend (50 + 3t) with autocorrelated errors is simulated and 
plotted: 


> set.seed(1) 

> z <- w <- rnorm(100, sd = 20) 

> for (t in 2:100) z[t] <- 0.8 * z[t - 1] + wit] 

> Time <- 1:100 

> x <- 50 + 3 * Time + z 

> plot(x, xlab = "time", type = "1") 

The model for the code above can be expressed as x; = 50 + 3t + z+, where 
{z+} is the AR(1) process z = 0.821 + ws and {w+} is Gaussian white noise 
with o = 20. A time plot of a realisation of {x+} is given in Figure 5.1. 
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Fig. 5.1. Time plot of a simulated time series with a straight-line trend and AR(1) 
residual errors. 


5.3 Fitted models 


5.3.1 Model fitted to simulated data 


Linear models are usually fitted by minimising the sum of squared errors, 
3522 = Y (at -— ao — o1u14 — ... — ont a)?, which is achieved in R using the 
function 1m: 


> x.lm <- 1m(x ^ Time) 
> coef(x.lm) 


(Intercept) Time 
58.55 3.06 


> sqrt(diag(vcov(x.1m))) 


(Intercept) Time 
4.8801 0.0839 


In the code above, the estimated parameters of the linear model are extracted 
using coef. Note that, as expected, the estimates are close to the underlying 
parameter values of 50 for the intercept and 3 for the slope. The standard 
errors are extracted using the square root of the diagonal elements obtained 
from vcov, although these standard errors are likely to be underestimated 
because of autocorrelation in the residuals. The function summary can also be 
used to obtain this information but tends to give additional information, for 
example t-tests, which may be incorrect for a time series regression analysis 
due to autocorrelation in the residuals. 

After fitting a regression model, we should consider various diagnostic 
plots. In the case of time series regression, an important diagnostic plot is the 
correlogram of the residuals: 


5.3 Fitted models 95 


> acf(resid(x.1m)) 
> pacf (resid(x.1m)) 


As expected, the residual time series is autocorrelated (Fig. 5.2). In Figure 
5.3, only the lag 1 partial autocorrelation is significant, which suggests that 
the residual series follows an AR(1) process. Again this should be as expected, 
given that an AR(1) process was used to simulate these residuals. 
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Fig. 5.2. Residual correlogram for the fitted straight-line model. 
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Fig. 5.3. Residual partial correlogram for the fitted straight-line model. 


5.3.2 Model fitted to the temperature series (1970—2005) 


In 81.4.5, we extracted temperatures for the period 1970-2005. The follow- 
ing regression model is fitted to the global temperature over this period, 
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and approximate 95% confidence intervals are given for the parameters us- 
ing confint. The explanatory variable is the time, so the function time is 
used to extract the ‘times’ from the ts temperature object. 


> www <- "http://www.massey.ac.nz/~pscowper/ts/global.dat" 

> Global <- scan(www) 

> Global.ts <- ts(Global, st = c(1856, 1), end = c(2005, 
12), fr = 12) 

> temp <- window(Global.ts, start = 1970) 

> temp.lm <- 1m(temp ^ time(temp)) 

> coef (temp.1m) 


(Intercept) time(temp) 
-34.9204 0.0177 


> confint (temp.1m) 


2.5% 97.5 % 
(Intercept) -37.2100 -32.6308 
time (temp) 0.0165 0.0188 


> acf(resid(1m(temp ^ time(temp)))) 


The confidence interval for the slope does not contain zero, which would pro- 
vide statistical evidence of an increasing trend in global temperatures if the 
autocorrelation in the residuals is negligible. However, the residual series is 
positively autocorrelated at shorter lags (Fig. 5.4), leading to an underesti- 
mate of the standard error and too narrow a confidence interval for the slope. 

Intuitively, the positive correlation between consecutive values reduces the 
effective record length because similar values will tend to occur together. The 
following section illustrates the reasoning behind this but may be omitted, 
without loss of continuity, by readers who do not require the mathematical 
details. 


5.3.3 Autocorrelation and the estimation of sample statistics* 


To illustrate the effect of autocorrelation in estimation, the sample mean will 
be used, as it is straightforward to analyse and is used in the calculation of 
other statistical properties. 

Suppose {x+ : t = 1,..., n} isa time series of independent random variables 
with mean E(x) = p and variance Var(z;) = o°. Then it is well known in 
the study of random samples that the sample mean z = $5, , 1«/n has mean 
E(x) = p and variance Var(z) = o?/m (or standard error o /A/n). Now let 
(z; : t£ — 1,...,n) be a stationary time series with E(x) = u, Var(z,) = 0°, 
and autocorrelation function Cor(x:, £t+k) = pk- Then the variance of the 
sample mean is given by 


n—1 


Var (z) — E 12M (1— k/n)p (5.5) 
k=1 
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Fig. 5.4. Residual correlogram for the regression model fitted to the global temper- 
ature series (1970-2005). 


In Equation (5.5) the variance o?/n for an independent random sam- 
ple arises as the special case where pp = 0 for all k > 0. If py > 0, then 
Var(Z) > o?/n and the resulting estimate of u is less accurate than that ob- 
tained from a random (independent) sample of the same size. Conversely, if 
pr < 0, then the variance of the estimate may actually be smaller than the 
variance obtained from a random sample of the same size. This latter result is 
due to the tendency for a value above the mean to be followed by a value below 
the mean, thus providing a more efficient estimate of the overall mean level. 
Conversely, for a positive correlation, values are more likely to persist above 
or below the mean, resulting in a less efficient estimate of the overall mean. 
Thus, for a positively correlated series, a larger sample would be needed to 
achieve the same level of accuracy in the estimate of u obtained from a sample 
of negatively (or zero) correlated series. Equation (5.5) can be proved using 
Equation (2.15) and the properties of variance: 


Var (Z) = Var [(a1 + £2 + +++ + z4)/n] = Var (z1 + 22 +- + £n) /n? 
= n-?*Cov (iia ti jan 23) = n? 355a 355 Cov (ai, £j) 


eu [70 TY TTereT Yn—2 T Yn—1 
yı Yo UE Yn—3 T Yn—2 
Yn-2 + Yn-38 t: t'a t + 
Yn-1 T Yn-2 TT T Yo] 


=n? [ny € 22754 (n — k) y] 
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Equation (5.5) follows after substituting y = c? and pp = y/o? in the last 
line above. 


5.4 Generalised least squares 


We have seen that in time series regression it is common and expected that the 
residual series will be autocorrelated. For a positive serial correlation in the 
residual series, this implies that the standard errors of the estimated regres- 
sion parameters are likely to be underestimated (Equation (5.5)), and should 
therefore be corrected. 

A fitting procedure known as generalised least squares (GLS) can be used 
to provide better estimates of the standard errors of the regression parameters 
to account for the autocorrelation in the residual series. The procedure is 
essentially based on maximising the likelihood given the autocorrelation in 
the data and is implemented in R in the gls function (within the nlme library, 
which you will need to load). 


5.4.1 GLS fit to simulated series 


The following example illustrates how to fit a regression model to the simu- 
lated series of 85.2.3 using generalised least squares: 


> library nlme) 
> x.gls <- gls(x ^ Time, cor = corAR1(0.8)) 
> coef (x.gls) 


(Intercept) Time 
58.23 3.04 


> sqrt (diag (vcov(x.gls))) 


(Intercept) Time 
11.925 0.202 


A lag 1 autocorrelation of 0.8 is used above because this value was used to 
simulate the data (§5.2.3). For historical series, the lag 1 autocorrelation would 
need to be estimated from the correlogram of the residuals of a fitted linear 
model; i.e., a linear model should first be fitted by ordinary least squares 
(OLS) and the lag 1 autocorrelation read off from a correlogram plot of the 
residuals of the fitted model. 

In the example above, the standard errors of the parameters are consid- 
erably greater than those obtained from OLS using 1m ($5.3) and are more 
accurate as they take the autocorrelation into account. The parameter esti- 
mates from GLS will generally be slightly different from those obtained with 
OLS, because of the weighting. For example, the slope is estimated as 3.06 
using 1m but 3.04 using gls. In principle, the GLS estimators are preferable 
because they have smaller standard errors. 
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5.4.2 Confidence interval for the trend in the temperature series 


To calculate an approximate 95% confidence interval for the trend in the global 
temperature series (1970-2005), GLS is used to estimate the standard error 
accounting for the autocorrelation in the residual series (Fig. 5.4). In the gls 
function, the residual series is approximated as an AR(1) process with a lag 
1 autocorrelation of 0.7 read from Figure 5.4, which is used as a parameter in 
the gls function: 


> temp.gls <- gls(temp ~ time(temp), cor = corAR1(0.7)) 


> confint(temp.gls) 


2.5% 97.5 % 
(Intercept) -39.8057 -28.4966 
time (temp) 0.0144 0.0201 


Although the confidence intervals above are now wider than they were in §5.3, 
zero is not contained in the intervals, which implies that the estimates are 
statistically significant, and, in particular, that the trend is significant. Thus, 
there is statistical evidence of an increasing trend in global temperatures over 
the period 1970-2005, so that, if current conditions persist, temperatures may 
be expected to continue to rise in the future. 


5.5 Linear models with seasonal variables 


5.5.1 Introduction 


As time series are observations measured sequentially in time, seasonal effects 
are often present in the data, especially annual cycles caused directly or indi- 
rectly by the Earth's movement around the Sun. Seasonal effects have already 
been observed in several of the series we have looked at, including the airline 
series (81.4.1), the temperature series (81.4.5), and the electricity production 
series (81.4.3). In this section, linear regression models with predictor variables 
for seasonal effects are considered. 


5.5.2 Additive seasonal indicator variables 


Suppose a time series contains s seasons. For example, with time series mea- 
sured over each calendar month, s — 12, whereas for series measured over 
six-month intervals, corresponding to summer and winter, s = 2. A seasonal 
indicator model for a time series (x, : t = 1,...,n} containing s seasons and 
a trend m, is given by 

T= Mi + StH zt (5.6) 


where s; = 3; when ¢ falls in the ith season (t = 1,...,n;i = 1,...,s8) and 
{z+} is the residual error series, which may be autocorrelated. This model 
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takes the same form as the additive decomposition model (Equation (1.2)) 
but differs in that the trend is formulated with parameters. In Equation (5.6), 
mz does not have a constant term (referred to as the intercept), i.e., m, could 


be a polynomial of order p with parameters o1,..., o. Equation (5.6) is then 
equivalent to a polynomial trend in which the constant term depends on the 
season, so that the s seasonal parameters (61, . .. , Bs) correspond to s possible 


constant terms in Equation (5.2). Equation (5.6) can therefore be written as 
Lt = Me + Oi. (1-1) mods + 7t (5.7) 


For example, with a time series {x+} observed for each calendar month 
beginning with t — 1 at January, a seasonal indicator model with a straight- 
line trend is given by 


aıt “15 By T Zt t 1,13,.. 
ait + b2 + zi t—2,14,... 

Tt = aıt + St +H a= : (5.8) 
art + Bio + zi $= 125240. 


The parameters for the model in Equation (5.8) can be estimated by OLS 
or GLS by treating the seasonal term s; as a ‘factor’. In R, the factor function 
can be applied to seasonal indices extracted using the function cycle (81.4.1). 


5.5.3 Example: Seasonal model for the temperature series 


'The parameters of a straight-line trend with additive seasonal indices can be 
estimated for the temperature series (1970-2005) as follows: 


> Seas <- cycle(temp) 

> Time <- time(temp) 

> temp.lm <- l1m(temp ^ O + Time + factor(Seas)) 
> coef (temp.1m) 


Time factor(Seas)1 factor(Seas)2 factor(Seas)3 


0.0177 -34.9973 -34.9880 -35.0100 
factor(Seas)4 factor(Seas)5 factor(Seas)6 factor(Seas)7 
-35.0123 -35.0337 -35.0251 -35.0269 
factor(Seas)8 factor(Seas)9 factor(Seas)10 factor(Seas)11 
-35.0248 -35.0383 -35.0525 -35.0656 
factor(Seas)12 
-35.0487 


A zero is used within the formula to ensure that the model does not have an 
intercept. If the intercept is included in the formula, one of the seasonal terms 
will be dropped and an estimate for the intercept will appear in the output. 
However, the fitted models, with or without an intercept, would be equivalent, 
as can be easily verified by rerunning the algorithm above without the zero in 
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the formula. The parameters can also be estimated by GLS by replacing 1m 
with gls in the code above. 

Using the above fitted model, a two-year-ahead future prediction for the 
temperature series is obtained as follows: 


> new.t <- seq(2006, len = 2 * 12, by = 1/12) 
> alpha <- coef (temp. lm) [1] 

> beta <- rep(coef(temp.1m) [2:13], 2) 

> (alpha * new.t + beta) [1:4] 


factor(Seas)1 factor(Seas)2 factor(Seas)3 factor(Seas)4 
0.524 0.535 0.514 0.514 


Alternatively, the predict function can be used to make forecasts provided 
the new data are correctly labelled within a data.frame: 


> new.dat <- data.frame(Time = new.t, Seas = rep(1:12, 2)) 
> predict(temp.1m, new.dat) [1:24] 


1 2 3 4 5 6 7 8 9 10 11 12 
0.524 0.535 0.514 0.514 0.494 0.504 0.503 0.507 0.495 0.482 0.471 0.489 
13 14 15 16 17 18 19 20 21 22 23 24 
0.542 0.553 0.532 0.531 0.511 0.521 0.521 0.525 0.513 0.500 0.488 0.507 


5.6 Harmonic seasonal models 


In the previous section, one parameter estimate is used per season. However, 
seasonal effects often vary smoothly over the seasons, so that it may be more 
parameter-efficient to use a smooth function instead of separate indices. 

Sine and cosine functions can be used to build smooth variation into a 
seasonal model. A sine wave with frequency f (cycles per sampling interval), 
amplitude A, and phase shift ¢ can be expressed as 


Asin(2z ft + ¢) = as sin(2x ft) + a. cos(27 ft) (5.9) 


where a, = Acos(¢) and a, = Asin($). The expression on the right-hand 
side of Equation (5.9) is linear in the parameters a, and a,, whilst the left- 
hand side is non-linear because the parameter ¢ is within the sine function. 
Hence, the expression on the right-hand side is preferred in the formulation 
of a seasonal regression model, so that OLS can be used to estimate the 
parameters. For a time series {z+} with s seasons there are [s/2] possible 
cycles.! The harmonic seasonal model is defined by 


! The notation | ] represents the integer part of the expression within. In most 
practical cases, s is even and so | ] can be omitted. However, for some ‘seasons’, 
s may be an odd number, making the notation necessary. For example, if the 
‘seasons’ are the days of the week, there would be [7/2] = 3 possible cycles. 
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[s/2] 
Ti = M + Mc [si sin(2mit/s) + c; cos(2mit/s)} + zi (5.10) 


i=l 


where m, is the trend which includes a parameter for the constant term, and s; 
and c; are unknown parameters. The trend may take a polynomial form as in 
Equation (5.2). When s is an even number, the value of the sine at frequency 
1/2 (when i = s/2 in the summation term shown in Equation (5.10)) will 
be zero for all values of t, and so the term can be left out of the model. 
Hence, with a constant term included, the maximum number of parameters 
in the harmonic model equals that of the seasonal indicator variable model 
(Equation (5.6)), and the fits will be identical. 

At first sight it may seem strange that the harmonic model has cycles of 
a frequency higher than the seasonal frequency of 1/5. However, the addition 
of further harmonics has the effect of perturbing the underlying wave to make 
it less regular than a standard sine wave of period s. This usually still gives 
a dominant seasonal pattern of period s, but with a more realistic underlying 
shape. For example, suppose data are taken at monthly intervals. Then the 
second plot given below might be a more realistic underlying seasonal pattern 
than the first plot, as it perturbs the standard sine wave by adding another 
two harmonic terms of frequencies 2/12 and 4/12 (Fig. 5.5): 


> TIME «- seq(1, 12, len = 1000) 

> plot(TIME, sin(2 * pi * TIME/12), type = "1") 

> plot(TIME, sin(2 * pi * TIME/12) + 0.2 * sin(2 * pi * 2 * 
TIME/12) + 0.1 * sin(2 * pi * 4 * TIME/12) + 0.1 * 
cos(2 * pi * 4 * TIME/12), type = "1") 


'The code above illustrates just one of many possible combinations of harmon- 
ics that could be used to model a wide range of possible underlying seasonal 
patterns. 


5.6.1 Simulation 


It is straightforward to simulate a series based on the harmonic model given 
by Equation (5.10). For example, suppose the underlying model is 


a, = 0.1 + 0.005 + 0.0014? + sin(27/12)4- 


5.11 

0.2 sin(47t/12) + 0.1 sin(871/12) + 0.1 cos(811t/12) + ws PU 
where {w+} is Gaussian white noise with standard deviation 0.5. This model 
has the same seasonal harmonic components as the model represented in Fig- 
ure 5.5b but also contains an underlying quadratic trend. Using the code 
below, a series of length 10 years is simulated, and it is shown in Figure 5.6. 


> set.seed(1) 
> TIME «- 1:(10 * 12) 
> w <- rnorm(10 * 12, sd = 0.5) 
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(b) 


Fig. 5.5. Two possible underlying seasonal patterns for monthly series based on the 
harmonic model (Equation (5.10)). Plot (a) is of the first harmonic over a year and 
is usually too regular for most practical applications. Plot (b) is of the same wave 
but with a further two harmonics added. Plot (b) illustrates just one of many ways 
that an underlying sine wave can be perturbed to produce a less regular, but still 
dominant, seasonal pattern of period 12 months. 


> Trend «- 0.1 + 0.005 * TIME + 0.001 * TIME^2 

> Seasonal <- sin(2*pi*TIME/12) + 0.2*sin(2*pi*2*TIME/12) + 
0.1*sin(2*pi*4*TIME/12) + 0.1*cos(2*pi*4*TIME/12) 

> x <- Trend + Seasonal + w 

> plot(x, type = "1") 


5.6.2 Fit to simulated series 
With reference to Equation (5.10), it would seem reasonable to place the 
harmonic variables in matrices, which can be achieved as follows: 


> SIN «- COS <- matrix(nr = length(TIME), nc = 6) 
> for (i in 1:6) { 
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Fig. 5.6. Ten years of simulated data for the model given by Equation (5.11). 


COS[, i] <- cos(2 * pi * i * TIME/12) 
SIN[, i] <- sin(2 * pi * i * TIME/12) 
} 


In most cases, the order of the harmonics and polynomial trend will be un- 
known. However, the harmonic coefficients are known to be independent, 
which means that all harmonic coefficients that are not statistically signif- 
icant can be dropped. It is largely a subjective decision on the part of the 
statistician to decide what constitutes a significant variable. An approximate 
t-ratio of magnitude 2 is a common choice and corresponds to an approximate 
5% significance level. This t-ratio can be obtained by dividing the estimated 
coefficient by the standard error of the estimate. The following example illus- 
trates the procedure applied to the simulated series of the last section: 
> x.lm1 <- lm(x ^ TIME + I(TIME*2) + COS[, 1] + SIN[, 1] + 
COS[, 2] + SIN[, 2] + COS[, 3] + SIN[, 3] + COS[, 4] + 
SINC, 4] + COS[, 5] + SIN[, 5] + COS[, 6] + SIN[, 6]) 
> coef (x.1m1)/sqrt (diag(vcov(x.1m1))) 


(Intercept) TIME I(TIME*2)  COS[, 1] SIN[, 1] COS[, 2] 
1.239 1.125 25.933 0.328 15.442 -0.515 
SIN[, 2] cCOS[, 3] SIN[, 3] COS[, 4] SIN[, 4] COS[, 5] 
3.447 0.232 -0.703 0.228 1.053 -1.150 
SIN[, 5] cCOS[, 6] SIN[, 6] 
0.857 -0.310 0.382 


The preceding output has three significant coefficients. These are used in the 
following model:? 


? Some statisticians choose to include both the COS and SIN terms for a particular 
frequency if either has a statistically significant value. 
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> x.1m2 <- lm(x ^ I(TIME^2) + SIN[, 1] + SIN[, 2]) 
> coef (x.1m2)/sqrt (diag (vcov(x.1m2))) 


(Intercept) | I(TIME^2) SIN[, 1] SIN[, 2] 
4.63 111.14 15.79 3.49 


As can be seen in the output from the last command, the coefficients are all 
significant. The estimated coefficients of the best-fitting model are given by 


> coef(x.1m2) 


(Intercept) | I(TIME^2) SIN[, 1] SIN[, 2] 
0.28040 0.00104 0.90021 0.19886 


The coefficients above give the following model for predictions at time t: 
4, = 0.280 + 0.00104? + 0.900 sin(271/12) + 0.199 sin(471/12) (5.12) 


'The AIC can be used to compare the two fitted models: 
> AIC(x.1m1) 
[1] 165 
> AIC(x.1m2) 
[1] 150 


As expected, the last model has the smallest AIC and therefore provides the 
best fit to the data. Due to sampling variation, the best-fitting model is not 
identical to the model used to simulate the data, as can easily be verified by 
taking the AIC of the known underlying model: 


> AIC(1m(x ^ TIME +I(TIME*2) +SIN[,1] +SIN[,2] +SIN[,4] +COS[,4])) 
[1] 153 


In R, the algorithm step can be used to automate the selection of the best- 
fitting model by the AIC. For the example above, the appropriate command 
is step(x.1m1), which contains all the predictor variables in the form of the 
first model. Try running this command, and check that the final output agrees 
with the model selected above. 

A best fit can equally well be based on choosing the model that leads to 
the smallest estimated standard deviations of the errors, provided the degrees 
of freedom are taken into account. 


5.6.3 Harmonic model fitted to temperature series (1970—2005) 


In the code below, a harmonic model with a quadratic trend is fitted to the 
temperature series (1970-2005) from 85.3.2. The units for the ‘time’ variable 
are in ‘years’, so the divisor of 12 is not needed when creating the harmonic 
variables. To reduce computation error in the OLS procedure due to large 
numbers, the TIME variable is standardized after the COS and SIN predictors 
have been calculated. 


106 5 Regression 


> SIN <- COS <- matrix(nr = length(temp), nc = 6) 
> for (i in 1:6) { 
COS[, i] <- cos(2 * pi * i * time(temp)) 
SIN[, i] <- sin(2 * pi * i * time(temp)) 
} 
> TIME «- (time(temp) - mean(time(temp)))/sd(time(temp)) 
> mean(time (temp) ) 


[1] 1988 
> sd(time (temp) ) 
[1] 10.4 


> temp.lmi <- lm(temp ^ TIME + I(TIME^2) + 
COS[,1] + SIN[,1] + COS[,2] + SIN[,2] + 
C0S[,3] + SIN[,3] + COS[,4] + SIN[,4] + 
COS[,5] + SIN[,5] + COS[,6] + SINI[,6]) 
> coef (temp.1m1)/sqrt (diag (vcov(temp.1m1))) 


(Intercept) TIME I(TIME*2)  COS[, 1] SIN[, 1] COS[, 2] 
18.245 30.271 1.281 0.747 2.383 1.260 
SIN[, 2] cCOS[, 3] SIN[, 3] COS[, 4] SIN[, 4] COS[, 5] 
1.919 0.640 0.391 0.551 0.168 0.324 
SIN[, 5] cCOS[, 6] SIN[, 6] 
0.345 -0.409 -0.457 


> temp.lm2 «- lm(temp ^ TIME + SIN[, 1] + SIN[, 2]) 
> coef (temp.1m2) 


(Intercept) TIME SIN[, 1] SIN[, 2] 
0.1750 0.1841 0.0204 0.0162 


> AIC(temp.1m) 


1] -547 


> AIC(temp.1m1) 


1] -545 


> AlC(temp.1m2) 


[1] -561 


Again, the AIC is used to compare the fitted models, and only statistically 
significant terms are included in the final model. 

To check the adequacy of the fitted model, it is appropriate to create a 
time plot and correlogram of the residuals because the residuals form a time 
series (Fig. 5.7). The time plot is used to detect patterns in the series. For 
example, if a higher-ordered polynomial is required, this would show up as a 
curve in the time plot. The purpose of the correlogram is to determine whether 
there is autocorrelation in the series, which would require a further model. 
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> plot(time(temp), resid(temp.1m2), type = "1") 
> abline(0, 0, col = "red") 

> acf(resid(temp.1m2)) 

> pacf (resid(temp.1m2)) 


In Figure 5.7(a), there is no discernible curve in the series, which implies 
that a straight line is an adequate description of the trend. A tendency for the 
series to persist above or below the z-axis implies that the series is positively 
autocorrelated. This is verified in the correlogram of the residuals, which shows 
a clear positive autocorrelation at lags 1-10 (Fig. 5.7b). 
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Fig. 5.7. Residual diagnostic plots for the harmonic model fitted to the temperature 
series (1970-2005): (a) the residuals plotted against time; (b) the correlogram of the 
residuals (time units are months); (c) partial autocorrelations plotted against lag 
(in months). 


The correlogram in Figure 5.7 is similar to that expected of an AR(p) 
process (84.5.5). This is verified by the plot of the partial autocorrelations, 
in which only the lag 1 and lag 2 autocorrelations are statistically significant 
(Fig. 5.7). In the code below, an AR(2) model is fitted to the residual series: 
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> res.ar <- ar(resid(temp.1m2), method = "mle") 
> res.ar$ar 


[1] 0.494 0.307 

> sd(res.ar$res[-(1:2)]) 
[1] 0.0837 

> acf(res.ar$res[-(1:2)]) 


The correlogram of the residuals of the fitted AR(2) model is given in Figure 
5.8, from which it is clear that the residuals are approximately white noise. 
Hence, the final form of the model provides a good fit to the data. The fitted 
model for the monthly temperature series can be written as 


0.184(t — 1988) 


=0.1 
zı = 0.175 + 104 


+ 0.0204 sin(27t) + 0.0162 sin(4zt) + z+ (5.13) 
where t is ‘time’ measured in units of ‘years’, the residual series {z,} follow 
an AR(2) process given by 


Zt = 0.49424. 1 + 0.307 z4_2 + Wt (5.14) 


and {w;} is white noise with mean zero and standard deviation 0.0837. 

If we require an accurate assessment of the standard error, we should refit 
the model using gls, allowing for an AR(2) structure for the errors (Exer- 
cise 6). 
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Fig. 5.8. Correlogram of the residuals of the AR(2) model fitted to the residuals of 
the harmonic model for the temperature series. 
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5.7 Logarithmic transformations 


5.7.1 Introduction 


Recall from §5.2 that the natural logarithm (base e) can be used to transform 
a model with multiplicative components to a model with additive components. 
For example, if {x+} is a time series given by 


Li = M, 8 zi (5.15) 


where mj is the trend, s; is the seasonal effect, and z; is the residual error, 
then the series (yr), given by 


yt = log x; = log m; + log s, + log z; = Mi + St + zi (5.16) 


has additive components, so that if m, and s; are also linear functions, the 
parameters in Equation (5.16) can be estimated by OLS. In Equation (5.16), 
logs can be taken only if the series {x+} takes all positive values; i.e., x, > 0 for 
all t. Conversely, a log-transformation may be seen as an appropriate model 
formulation when a series can only take positive values and has values near 
zero because the anti-log forces the predicted and simulated values for {x+} 
to be positive. 


5.7.2 Example using the air passenger series 


Consider the air passenger series from 81.4.1. Time plots of the original series 
and the natural logarithm of the series can be obtained using the code below 
and are shown in Figure 5.9. 


> data(AirPassengers) 
> AP <- AirPassengers 
> plot (AP) 

> plot (log (AP) ) 


In Figure 5.9(a), the variance can be seen to increase as t increases, whilst 
after the logarithm is taken the variance is approximately constant over the 
period of the record (Fig. 5.9b). Therefore, as the number of people using 
the airline can also only be positive, the logarithm would be appropriate in 
the model formulation for this time series. In the following code, a harmonic 
model with polynomial trend is fitted to the air passenger series. The function 
time is used to extract the time and create a standardised time variable TIME. 


> SIN «- COS «- matrix(nr = length(AP), nc = 6) 
> for (i in 1:6) { 
SIN[, i] <- sin(2 * pi * i * time(AP)) 
COS[, i] <- cos(2 * pi * i * time(AP)) 
$ 
> TIME <- (time(AP) - mean(time(AP)))/sd(time(AP)) 
> mean(time (AP) ) 
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Fig. 5.9. Time plots of (a) the airline series (1949-1960) and (b) the natural loga- 
rithm of the airline series. 


[1] 1955 
> sd(time(AP)) 
[1] 3.48 


> AP.lm1 <- lm(log(AP) ^ TIME + I(TIME*2) + I(TIME^3) + I(TIME^4) + 
SIN[,1] + COS[,1] + SIN[,2] + COS[,2] + SIN[,3] + COS[,3] + 
SIN[,4] + COS[,4] + SIN[,5] + COS[,5] + SIN[,6] + COS[,6]) 

> coef (AP.1m1)/sqrt (diag (vcov(AP.1m1))) 


(Intercept) TIME I(TIME^2) I(TIME^3) I(TIME^4)  SIN[, 1] 

744.685 42.382 -4.162 -0.751 1.873 4.868 

COS[, 1]  SIN[, 2]  cOS[, 2]  SIN[, 3]  COS[, 3]  SIN[, 4] 

-26.055 10.395 10.004 -4.844 -1.560 -5.666 
COS[, 4]  SIN[, 5]  COS[, 5]  SIN[, 6] COSI, 6] 
1.946 -3.766 1.026 0.150 -0.521 


> AP.1m2 <- 1m(log(AP) ^ TIME + I(TIME^2) + SIN[,1] + COS[,1] + 
SIN[,2] + COS[,2] + SIN[,3] + SIN[,4] + COS[,4] + SIN[,5]) 
> coef (AP.1m2)/sqrt (diag (vcov(AP.1m2)) ) 


5.7 Logarithmic transformations 111 


(Intercept) TIME I(TIME^2)  SIN[, 1]  COS[, 1] SIN[, 2] 

922.63 103.52 -8.24 4.92 -25.81 10.36 
COS[, 2] SIN[, 3] SIN[, 4] COS[, 4] SIN[, 5] 
9.96 -4.79 -5.61 1.95 -3.73 


> AIC(AP.1m1) 
[1] -448 
> AIC(AP.1m2) 
[1] -451 


> acf(resid(AP.1m2) ) 
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Fig. 5.10. The correlogram (a) and partial autocorrelations (b) of the residual 
series. 


The residual correlogram indicates that the data are positively autocorre- 
lated (Fig. 5.10). As mentioned in §5.4, the standard errors of the parameter 
estimates are likely to be under-estimated if there is positive serial corre- 
lation in the data. This implies that predictor variables may falsely appear 
‘significant’ in the fitted model. In the code below, GLS is used to check the 
significance of the variables in the fitted model, using the lag 1 autocorrelation 
(approximately 0.6) from Figure 5.10. 
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> AP.gls «- gls(log(AP) ^ TIME + I(TIME^2) + SIN[,1] + COS[,1] + 
SIN[,2] + COS[,2] + SIN[,3] + SIN[,4] + COS[,4] + SIN[,5], 
cor = corAR1(0.6)) 

> coef (AP.gls)/sqrt (diag (vcov(AP.gls))) 


(Intercept) TIME I(TIME^2)  SIN[, 1]  COS[, 1] SIN[, 2] 

398.84 45.85 -3.65 3.30 -18.18 11.77 
cost, 2] SIN[, 3] SIN[, 4] COS[, 4] SIN[, 5] 
11.43 -7.63 -10.75 3.57 -7.92 


In Figure 5.10(b), the partial autocorrelation plot suggests that the resid- 
ual series follows an AR(1) process, which is fitted to the series below: 


> AP.ar <- ar(resid(AP.1m2), order = 1, method = "mle") 
> AP.ar$ar 


[1] 0.641 
> acf(AP.ar$res[-1]) 


The correlogram of the residuals of the fitted AR(1) model might be taken 
for white noise given that only one autocorrelation is significant (Fig. 5.11). 
However, the lag of this significant value corresponds to the seasonal lag (12) 
in the original series, which implies that the fitted model has failed to fully 
account for the seasonal variation in the data. Understandably, the reader 
might regard this as curious, given that the data were fitted using the full 
seasonal harmonic model. However, seasonal effects can be stochastic just 
as trends can, and the harmonic model we have used is deterministic. In 
Chapter 7, models with stochastic seasonal terms will be considered. 
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Fig. 5.11. Correlogram of the residuals from the AR(1) model fitted to the residuals 
of the logarithm model. 
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5.8 Non-linear models 


5.8.1 Introduction 


For the reasons given in $5.2, linear models are applicable to a wide range of 
time series. However, for some time series it may be more appropriate to fit 
a non-linear model directly rather than take logs or use a linear polynomial 
approximation. For example, if a series is known to derive from a known non- 
linear process, perhaps based on an underlying known deterministic law in 
science, then it would be better to use this information in the model formula- 
tion and fit a non-linear model directly to the data. In R, a non-linear model 
can be fitted by least squares using the function nls. 

In the previous section, we found that using the natural logarithm of a 
series could help stabilise the variance. However, using logs can present diffi- 
culties when a series contains negative values, because the log of a negative 
value is undefined. One way around this problem is to add a constant to all 
the terms in the series, so if {x+} is a series containing (some) negative values, 
then adding co such that co > max{—a;} and then taking logs produces a 
transformed series {log(co + z4)) that is defined for all t. A linear model (e.g., 
a straight-line trend) could then be fitted to produce for {x+} the model 


Lp = —Cy + e^o tete (5.17) 


where ao and o4 are model parameters and {z+} is a residual series that may 
be autocorrelated. 

The main difficulty with the approach leading to Equation (5.17) is that 
co Should really be estimated like any other parameter in the model, whilst in 
practice a user will often arbitrarily choose a value that satisfies the constraint 
(co > max[-—a,]). If there is a reason to expect a model similar to that in 
Equation (5.17) but there is no evidence for multiplicative residual terms, then 
the constant co should be estimated with the other model parameters using 
non-linear least squares; i.e., the following model should be fitted: 


Xt = —Co + eco toit TZ (5.18) 


5.8.2 Example of a simulated and fitted non-linear series 


As non-linear models are generally fitted when the underlying non-linear func- 
tion is known, we will simulate a non-linear series based on Equation (5.18) 
with co = 0 and compare parameters estimated using nls with those of the 
known underlying function. 

Below, a non-linear series with AR(1) residuals is simulated and plotted 
(Fig. 5.12): 


> set.seed(1) 
> w <- rnorm(100, sd = 10) 
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> z <- rep(0, 100) 

> for (t in 2:100) z[t] <- 0.7 * z[t - 1] + w[t] 
> Time «- 1:100 

> f <- function(x) exp(1 + 0.05 * x) 

> x <- f(Time) +z 

> plot(x, type = "1") 

> abline(0, 0) 
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Fig. 5.12. Plot of a non-linear series containing negative values. 


'The series plotted in Figure 5.12 has an apparent increasing exponential 
trend but also contains negative values, so that a direct log-transformation 
cannot be used and a non-linear model is needed. In R, a non-linear model is 
fitted by specifying a formula with the parameters and their starting values 
contained in a list: 


> x.nls <- nls(x ^ exp(alpO + alpi * Time), start = list(alpO = 0.1, 
alpi = 0.5)) 
> summary(x.nls)$parameters 


Estimate Std. Error t value Pr(>I|tl) 
alpO 1.1764 0.074295 15.8 9.20e-29 
alpi 0.0483 0.000819 59.0 2.35e-78 


The estimates for ag and o4 are close to the underlying values that were 
used to simulate the data, although the standard errors of these estimates are 
likely to be underestimated because of the autocorrelation in the residuals.? 


3 The generalised least squares function gls can be used to fit non-linear mod- 
els with autocorrelated residuals. However, in practice, computational difficulties 
often arise when using this function with non-linear models. 
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5.9 Forecasting from regression 


5.9.1 Introduction 


A forecast is a prediction into the future. In the context of time series re- 
gression, a forecast involves extrapolating a fitted model into the future by 
evaluating the model function for a new series of times. The main problem 
with this approach is that the trends present in the fitted series may change 
in the future. Therefore, it is better to think of a forecast from a regression 
model as an expected value conditional on past trends continuing into the 
future. 


5.9.2 Prediction in R 


The generic function for making predictions in R is predict. The function 
essentially takes a fitted model and new data as parameters. The key to using 
this function with a regression model is to ensure that the new data are 
properly defined and labelled in a data.frame. 

In the code below, we use this function in the fitted regression model 
of §5.7.2 to forecast the number of air passengers travelling for the 10-year 
period that follows the record (Fig. 5.13). The forecast is given by applying 
the exponential function (anti-log) to predict because the regression model 
was fitted to the logarithm of the series: 


> new.t <- time(ts(start = 1961, end = c(1970, 12), fr = 12)) 
> TIME <- (new.t - mean(time(AP)))/sd(time (AP) ) 
> SIN <- COS <- matrix(nr = length(new.t), nc = 6) 
> for (i in 1:6) { 
COS[, i] <- cos(2 * pi * i * new.t) 
SIN[, i] «- sin(2 * pi * i * new.t) 
} 
> SIN <- SIN[, -6] 
> new.dat <- data.frame(TIME = as.vector(TIME), SIN = SIN, 


cos = COS) 
> AP.pred.ts <- exp(ts(predict(AP.1m2, new.dat), st = 1961, 
fr = 12)) 


> ts.plot(log(AP), log(AP.pred.ts), lty = 1:2) 
> ts.plot(AP, AP.pred.ts, lty = 1:2) 


5.10 Inverse transform and bias correction 


5.10.1 Log-normal residual errors 


The forecasts in Figure 5.13(b) were obtained by applying the anti-log to the 
forecasted values obtained from the log-regression model. However, the process 
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Fig. 5.13. Air passengers (1949-1960; solid line) and forecasts (1961-1970; dotted 
lines): (a) logarithm and forecasted values; (b) original series and anti-log of the 
forecasted values. 


of using a transformation, such as the logarithm, and then applying an inverse 
transformation introduces a bias in the forecasts of the mean values. If the 
regression model closely fits the data, this bias will be small (as shown in the 
next example for the airline predictions). Note that a bias correction is only 
for means and should not be used in simulations. 

The bias in the means arises as a result of applying the inverse transform 
to a residual series. For example, if the time series are Gaussian white noise 
{wi}, with mean zero and standard deviation c, then the distribution of the 
inverse-transform (the anti-log) of the series is log-normal with mean e27”. 
'This can be verified theoretically, or empirically by simulation as in the code 
below: 


> set.seed(1) 

> sigma <- 1 

> w <- rnorm(1e+06, sd = sigma) 
> mean(w) 


[1] 4.69e-05 
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> mean(exp(w)) 
[1] 1.65 

> exp(sigma^2/2) 
[1] 1.65 


The code above indicates that the mean of the anti-log of the Gaussian 
white noise and the expected mean from a log-normal distribution are equal. 
Hence, for a Gaussian white noise residual series, a correction factor of eic 
should be applied to the forecasts of means. The importance of this correction 
factor really depends on the value of o?. If a? is very small, the correction 
factor will hardly change the forecasts at all and so could be neglected with- 
out major concern, especially as errors from other sources are likely to be 
significantly greater. 


5.10.2 Empirical correction factor for forecasting means 


The e27^ correction factor can be used when the residual series of the fitted 
log-regression model is Gaussian white noise. In general, however, the distri- 
bution of the residuals from the log regression (Exercise 5) is often negatively 
skewed, in which case a correction factor can be determined empirically us- 
ing the mean of the anti-log of the residual series. In this approach, adjusted 
forecasts (2;) can be obtained from 


gj = e82 Y eh fn (5.19) 
t=1 


where flog zı : t =1,...,n} is the predicted series given by the fitted log- 
regression model, and {z+} is the residual series from this fitted model. 

The following example illustrates the procedure for calculating the correc- 
tion factors. 


5.10.3 Example using the air passenger data 


For the airline series, the forecasts can be adjusted by multiplying the predic- 
2 . 
tions by e27 , where ø is the standard deviation of the residuals, or using an 
empirical correction factor as follows: 
> summary (AP.1m2)$r.sq 
[1] 0.989 


> sigma <- summary (AP.1m2)$sigma 
> lognorm.correction.factor <- exp((1/2) * sigma^2) 
> empirical.correction.factor <- mean(exp(resid(AP.1m2) )) 
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> lognorm.correction. factor 

[1] 1.001171 

> empirical.correction.factor 

[1] 1.001080 

> AP.pred.ts <- AP.pred.ts * empirical.correction.factor 


The adjusted forecasts in AP. pred.ts allow for the bias in taking the anti-log 
of the predictions. However, the small ø (and R? = 0.99) results in a small 
correction factor (of the order 0.1%), which is probably negligible compared 
with other sources of errors that exist in the forecasts. Whilst in this example 
the correction factor is small, there is no reason why it will be small in general. 


5.11 Summary of R commands 


1m fits a linear (regression) model 

coef extracts the parameter estimates from a fitted model 

confint returns a (95%) confidence interval for the parameters 
of a fitted model 

gls fits a linear model using generalised least squares (al- 
lowing for autocorrelated residuals) 

factor returns variables in the form of ‘factors’ or indicator 
variables 


5.12 Exercises 


1. a) Produce a time plot for (x, : t = 1,...,100}, where x, = 70 + 2t — 

30 + 2, {z1} is the AR(1) process z = 0.52; 1 +w, and {w;} is white 
noise with standard deviation 25. 

b) Fit a quadratic trend to the series (z,). Give the coefficients of the 
fitted model. 

c) Find a 95% confidence interval for the parameters of the quadratic 
model, and comment. 

d) Plot the correlogram of the residuals and comment. 

e) Refit the model using GLS. Give the standard errors of the parameter 
estimates, and comment. 


2. The standard errors of the parameter estimates of a fitted regression model 
are likely to be underestimated if there is positive serial correlation in the 
data. This implies that explanatory variables may appear as ‘significant’ 
when they should not. Use GLS to check the significance of the variables 
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of the fitted model from §5.6.3. Use an appropriate estimate of the lag 1 
autocorrelation within gls. 


. This question is based on the electricity production series (1958-1990). 


a) 


b) 


Give two reasons why a log-transformation may be appropriate for the 
electricity series. 

Fit a seasonal indicator model with a quadratic trend to the (natural) 
logarithm of the series. Use stepwise regression to select the best model 
based on the AIC. 

Fit a harmonic model with a quadratic trend to the logarithm of the 
series. Use stepwise regression to select the best model based on the 
AIC. 

Plot the correlogram and partial correlogram of the residuals from the 
overall best-fitting model and comment on the plots. 

Fit an AR model to the residuals of the best-fitting model. Give the 
order of the best-fitting AR model and the estimated model parame- 
ters. 

Plot the correlogram of the residuals of the AR model, and comment. 
Write down in full the equation of the best-fitting model. 

Use the best fitting model to forecast electricity production for the 
years 1991-2000, making sure you have corrected for any bias due to 
taking logs. 


. Suppose a sample of size n follows an AR(1) process with lag 1 autocor- 
relation pı = a. Use Equation (5.5) to find the variance of the sample 
mean. 


. A hydrologist wishes to simulate monthly inflows to the Font Reservoir 
over the next 10-year period. Use the data in Font .dat (82.3.3) to answer 
the following: 


a) 


Regress inflow on month using indicator variables and time t, and fit 
a suitable AR model to the residual error series. 

Plot a histogram of the residual errors of the fitted AR model, and 
comment on the plot. Fit back-to-back Weibull distributions to the 
errors. 

Simulate 20 realisations of inflow for the next 10 years. 

Give reasons why a log transformation may be suitable for the series 
of inflows. 

Regress log(inflow) on month using indicator variables and time t 
(as above), and fit a suitable AR model to the residual error series. 
Plot a histogram of the residual errors of the fitted AR model, and 
comment on the plot. Fit a back-to-back Weibull distribution to the 
residual errors. 
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g) Simulate 20 realisations of log(inflow) for the next 10-years. Take 
anti-logs of the simulated values to produce a series of simulated flows. 

h) Compare both sets of simulated flows, and discuss which is the more 
satisfactory. 


6. Refit the harmonic model to the temperature series using gls, allowing 
for errors from an AR(2) process. 

a) Construct a 99% confidence interval for the coefficient of time. 

) Plot the residual error series from the model fitted using GLS against 
the residual error series from the model fitted using OLS. 

) Refit the AR(2) model to the residuals from the fitted (GLS) model. 
d) How different are the fitted models? 

) Calculate the annual means. Use OLS to regress the annual mean 


temperature on time, and construct a 99% confidence interval for its 
coefficient. 


6 


Stationary Models 


6.1 Purpose 


As seen in the previous chapters, a time series will often have well-defined 
components, such as a trend and a seasonal pattern. A well-chosen linear re- 
gression may account for these non-stationary components, in which case the 
residuals from the fitted model should not contain noticeable trend or seasonal 
patterns. However, the residuals will usually be correlated in time, as this is 
not accounted for in the fitted regression model. Similar values may cluster to- 
gether in time; for example, monthly values of the Southern Oscillation Index, 
which is closely associated with El Nino, tend to change slowly and may give 
rise to persistent weather patterns. Alternatively, adjacent observations may 
be negatively correlated; for example, an unusually high monthly sales figure 
may be followed by an unusually low value because customers have supplies 
left over from the previous month. In this chapter, we consider stationary 
models that may be suitable for residual series that contain no obvious trends 
or seasonal cycles. The fitted stationary models may then be combined with 
the fitted regression model to improve forecasts. The autoregressive models 
that were introduced in 84.5 often provide satisfactory models for the residual 
time series, and we extend the repertoire in this chapter. The term stationary 
was discussed in previous chapters; we now give a more rigorous definition. 


6.2 Strictly stationary series 


A time series model {x+} is strictly stationary if the joint statistical distribu- 
tion of z,,,..., Ze, is the same as the joint distribution of £t 5, ... Zt, +m for 
all £1,...,t, and m, so that the distribution is unchanged after an arbitrary 
time shift. Note that strict stationarity implies that the mean and variance 
are constant in time and that the autocovariance Cov(x;, £s) only depends on 
lag k = |t — s| and can be written y(k). If a series is not strictly stationary 
but the mean and variance are constant in time and the autocovariance only 
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depends on the lag, then the series is called second-order stationary.! We focus 
on the second-order properties in this chapter, but the stochastic processes 
discussed are strictly stationary. Furthermore, if the white noise is Gaussian, 
the stochastic process is completely defined by the mean and covariance struc- 
ture, in the same way as any normal distribution is defined by its mean and 
variance-covariance matrix. 

Stationarity is an idealisation that is a property of models. If we fit a 
stationary model to data, we assume our data are a realisation of a stationary 
process. So our first step in an analysis should be to check whether there is any 
evidence of a trend or seasonal effects and, if there is, remove them. Regression 
can break down a non-stationary series to a trend, seasonal components, and 
residual series. It is often reasonable to treat the time series of residuals as a 
realisation of a stationary error series. Therefore, the models in this chapter 
are often fitted to residual series arising from regression analyses. 


6.3 Moving average models 


6.3.1 MA(q) process: Definition and properties 


A moving average (MA) process of order q is a linear combination of the 
current white noise term and the q most recent past white noise terms and is 
defined by 

Ti = We + biwi +... + BgWt—q (6.1) 


where {w+} is white noise with zero mean and variance c2. Equation (6.1) 
can be rewritten in terms of the backward shift operator B 


xı = (1 + 61B + 6B? +---+ B,B*)w = ¢4(B)ur (6.2) 


where ¢q is a polynomial of order q. Because MA processes consist of a finite 
sum of stationary white noise terms, they are stationary and hence have a 
time-invariant mean and autocovariance. 

The mean and variance for {x+} are easy to derive. The mean is just zero 
because it is a sum of terms that all have a mean of zero. The variance is o2 (14- 
Bi ...-- B) because each of the white noise terms has the same variance and 
the terms are mutually independent. The autocorrelation function, for k > 0, 
is given by 


1 k=0 
= Se Ba eee k=1,...,q (6.3) 
0 kq 


where 6o is unity. The function is zero when k > q because x, and zi 
then consist of sums of independent white noise terms and so have covariance 


! For example, the skewness, or more generally E(zizi442:41), might change over 
time. 
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zero. The derivation of the autocorrelation function is left to Exercise 1. An 
MA process is invertible if it can be expressed as a stationary autoregressive 
process of infinite order without an error term. For example, the MA process 
zı = (1 — 8B)w; can be expressed as 


Ut = (1 = 8B) !z, = fet B34—1 + B? riz MES (6.4) 


provided |8| < 1, which is required for convergence. 

In general, an MA(q) process is invertible when the roots of ¢,(B) all 
exceed unity in absolute value (Exercise 2). The autocovariance function only 
identifies a unique MA (q) process if the condition that the process be invertible 
is imposed. The estimation procedure described in 86.4 leads naturally to 
invertible models. 


6.3.2 R examples: Correlogram and simulation 


The autocorrelation function for an MA(q) process (Equation (6.3)) can read- 
ily be implemented in R, and a simple version, without any detailed error 
checks, is given below. Note that the function takes the lag k and the model 
parameters 8; for i = 0,1,...,q, with 8) = 1. For the non-zero values (i.e., 
values within the else part of the if-else statement), the autocorrelation 
function is computed in two stages using a for loop. The first loop generates 
a sum (s1) for the autocovariance, whilst the second loop generates a sum 
(s2) for the variance, with the division of the two sums giving the returned 
autocorrelation (ACF). 


> rho <- function(k, beta) { 

q <- length(beta) - 1 

if (k > q) ACF <- 0 else 1 
si <- 0; s2 <- 0 
for (i in 1:(q-k*1)) s1 <- s1 + beta[i] * beta[i+k] 
for (i in 1:(q*1)) s2 <- s2 + beta[i]l^2 
ACF «- si / s2} 

ACF} 


Using the code above for the autocorrelation function, correlograms for a range 
of MA(q) processes can be plotted against lag — the code below provides an 
example for an MA(3) process with parameters 3, = 0.7, B5 = 0.5, and 
33 = 0.2 (Fig. 6.1a). 


beta <- c(1, 0.7, 0.5, 0.2) 

rho.k <- rep(1, 10) 

for (k in 1:10) rho.k[k] <- rho(k, beta) 

plot(0:10, c(1, rho.k), pch = 4, ylab = expression(rho[k])) 
abline(0, 0) 


V MV M MN 


The plot in Figure 6.1(b) is the autocovariance function for an MA(3) process 
with parameters 3; = —0.7, G2 = 0.5, and 63 = —0.2, which has negative 
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Fig. 6.1. Plots of the autocorrelation functions for two MA(3) processes: (a) 81 = 
0.7, B2 = 0.5, B3 = 0.2; (b) 81 = —0.7, fa = 0.5, f = —0.2. 


correlations at lags 1 and 3. The function expression is used to get the 
Greek symbol p. 

The code below can be used to simulate the MA(3) process and plot the cor- 
relogram of the simulated series. An example time plot and correlogram are 
shown in Figure 6.2. As expected, the first three autocorrelations are signif- 
icantly different from 0 (Fig. 6.2b); other statistically significant correlations 
are attributable to random sampling variation. Note that in the correlogram 
plot (Fig. 6.2b) 1 in 20 (596) of the sample correlations for lags greater than 
3, for which the underlying population correlation is zero, are expected to be 
statistically significantly different from zero at the 596 level because multiple 
t-test results are being shown on the plot. 


> set.seed(1) 
> b <- c(0.8, 0.6, 0.4) 
> x <- w <- rnorm(1000) 
> for (t in 4:1000) 1 
for (j in 1:3) x[t] <- x[t] + b[j] * wit - j] 
} 
> plot(x, type = "1") 
> acf (x) 


6.4 Fitted MA models 


6.4.1 Model fitted to simulated series 


An MA(q) model can be fitted to data in R using the arima function with 
the order function parameter set to c(0,0,q). Unlike the function ar, the 
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Fig. 6.2. (a) Time plot and (b) correlogram for a simulated MA(3) process. 


function arima does not subtract the mean by default and estimates an in- 
tercept term. MA models cannot be expressed in a multiple regression form, 
and, in general, the parameters are estimated with a numerical algorithm. The 
function arima minimises the conditional sum of squares to estimate values of 
the parameters and will either return these if method=c("CSS") is specified 
or use them as initial values for maximum likelihood estimation. 

A description of the conditional sum of squares algorithm for fitting an 
MA(q) process follows. For any choice of parameters, the sum of squared 
residuals can be calculated iteratively by rearranging Equation (6.1) and re- 
placing the errors, w;, with their estimates (that is, the residuals), which are 
denoted by 1»: 


n n 
` " n " 2 
OREN B) = 2a? = V7 {a ita ded: Bini) (6.5) 
t=1 t=1 
conditional on wo, ..., Ù+—q being taken as 0 to start the iteration. A numerical 


search is used to find the parameter values that minimise this conditional sum 
of squares. 

In the following code, a moving average model, x .ma, is fitted to the simu- 
lated series of the last section. Looking at the parameter estimates (coefficients 
in the output below), it can be seen that the 95% confidence intervals (approx- 
imated by coeff. +2 s.e. of coeff.) contain the underlying parameter values (0.8, 
0.6, and 0.4) that were used in the simulations. Furthermore, also as expected, 
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the intercept is not significantly different from its underlying parameter value 
of zero. 


> x.ma <- arima(x, order = c(0, 0, 3)) 
> x.ma 


Call: 
arima(x = x, order = c(0, 0, 3)) 


Coefficients: 
mai ma2 ma3 intercept 
0.790 0.566 0.396 -0.032 
s.e. 0.031 0.035 0.032 0.090 


sigma^2 estimated as 1.07: log likelihood = -1452, aic = 2915 


It is possible to set the value for the mean to zero, rather than estimate 
the intercept, by using include.mean=FALSE within the arima function. This 
option should be used with caution and would only be appropriate if you 
wanted {x+} to represent displacement from some known fixed mean. 


6.4.2 Exchange rate series: Fitted MA model 


In the code below, an MA(1) model is fitted to the exchange rate series. 
If you refer back to 84.6.2, a comparison with the output below indicates 
that the AR(1) model provides the better fit, as it has the smaller standard 
deviation of the residual series, 0.031 compared with 0.042. Furthermore, the 
correlogram of the residuals indicates that an MA(1) model does not provide 
a satisfactory fit, as the residual series is clearly not a realistic realisation of 
white noise (Fig. 6.3). 


> www <- "http://www.massey.ac.nz/^pscowper/ts/pounds, nz.dat" 
> x <- read.table(www, header = T) 

> x.ts <- ts(x, st = 1991, fr = 4) 

> x.ma <- arima(x.ts, order = c(0, 0, 1)) 

> x.ma 

Call: 


arima(x = x.ts, order = c(0, 0, 1)) 


Coefficients: 
mai intercept 
1.000 2.833 
s.e. 0.072 0.065 


sigma^2 estimated as 0.0417: log likelihood = 4.76, aic = -3.53 


> acf(x.ma$res[-1]) 
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Fig. 6.3. The correlogram of residual series for the MA(1) model fitted to the 
exchange rate data. 


6.5 Mixed models: The ARMA process 
6.5.1 Definition 


Recall from Chapter 4 that a series {x+} is an autoregressive process of order p, 
an AR(p) process, if 


T, = Q1Tt—1 + 0314.3 +... + OTt op + We (6.6) 


where {w,} is white noise and the a; are the model parameters. A useful 
class of models are obtained when AR and MA terms are added together in a 
single expression. A time series {x+} follows an autoregressive moving average 
(ARMA) process of order (p, q), denoted ARMA(p, q), when 


Lp = 0474 14-0324 24. . Ot pd Wtd- Gi wir-1+Gowr-2+...+Bqwi—q (6.7) 


where {w;} is white noise. Equation (6.7) may be represented in terms of the 
backward shift operator B and rearranged in the more concise polynomial 
form 


p(B) = é4(B)wi (6.8) 
The following points should be noted about an ARMA(p, q) process: 


(a) The process is stationary when the roots of 0 all exceed unity in absolute 
value. 

(b) The process is invertible when the roots of ¢ all exceed unity in absolute 
value. 

(c) The AR(p) model is the special case ARMA(p, 0). 

(d) The MA(g) model is the special case ARMA(0, q). 

(e) Parameter parsimony. When fitting to data, an ARMA model will often 
be more parameter efficient (i.e., require fewer parameters) than a single 
MA or AR model. 
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(e) Parameter redundancy. When 6 and ¢ share a common factor, a stationary 
model can be simplified. For example, the model (1 — 1B)(1— 3 B)a, = 
(1 — $B)u; can be written (1 — $B)z, = wi. 


6.5.2 Derivation of second-order properties* 


In order to derive the second-order properties for an ARMA(p, q) process 
[zi], it is helpful first to express the x, in terms of white noise components 
w+ because white noise terms are independent. We illustrate the procedure for 
the ARMA(1, 1) model. 

The ARMA(1, 1) process for {x+} is given by 


Tt = Oma dw, wa (6.9) 


where w: is white noise, with E(w;) = 0 and Var(w;) = o2. Rearranging 
Equation (6.9) to express x+ in terms of white noise components, 


xı = (1 — aB) !(14- 8B)u, 


Expanding the right-hand-side, 


rı =(1+aB+o7B?+...)(14+ 8B)us 


= (> s) (1+ BB) w, 
1=0 
= (: + 5 at pun 4 X: oan) Wt 


i—0 1=0 
= w, t (a+ B) S a wi (6.10) 
i=l 


With the equation in the form above, the second-order properties follow. For 
example, the mean E (z+) is clearly zero because E(w:—;) = 0 for all i, and 
the variance is given by 


Var(z,) = Var [vs + (a+ B) 5 aitus] 
i=1 
= o? +02 (a+ 8) (1— o?) (6.11) 


The autocovariance Ņyķ, for k > 0, is given by 


oo 
Cov (£t, zt4 x) = (a+ 8 atto? a+ 2 G2 ak o2? 
w wW 
i=1 


= (a+ B) ake? + (a + B! o2a*(1 — a) 
(6.12) 
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The autocorrelation o; then follows as 


Pk = Yk/Yo = Cov (xe, xeu) / Var (ze) 


k—1 1 
E en 


Note that Equation (6.13) implies pk = oii. 


6.6 ARMA models: Empirical analysis 


6.6.1 Simulation and fitting 


The ARMA process, and the more general ARIMA processes discussed in the 
next chapter, can be simulated using the R function arima.sim, which takes a 
list of coefficients representing the AR and MA parameters. An ARMA(p, q) 
model can be fitted using the arima function with the order function param- 
eter set to c(p, 0, q). The fitting algorithm proceeds similarly to that for 
an MA process. Below, data from an ARMA(1, 1) process are simulated for 
a = —0.6 and 8 = 0.5 (Equation (6.7)), and an ARMA(1, 1) model fitted to 
the simulated series. As expected, the sample estimates of a and are close 
to the underlying model parameters. 


> set.seed(1) 
> x <- arima.sim(n = 10000, list(ar = -0.6, ma = 0.5)) 
> coef(arima(x, order = c(1, 0, 1))) 


ari mai intercept 
-0.59697 0.50270 -0.00657 


6.6.2 Exchange rate series 


In $6.3, a simple MA (1) model failed to provide an adequate fit to the exchange 
rate series. In the code below, fitted MA(1), AR(1) and ARMA(1, 1) models 
are compared using the AIC. The ARMA(1, 1) model provides the best fit 
to the data, followed by the AR(1) model, with the MA(1) model providing 
the poorest fit. The correlogram in Figure 6.4 indicates that the residuals of 
the fitted ARMA(1, 1) model have small autocorrelations, which is consistent 
with a realisation of white noise and supports the use of the model. 


> x.ma <- arima(x.ts, order = c(0, 0, 1)) 
> x.ar <- arima(x.ts, order = c(i, 0, 0)) 
> x.arma <- arima(x.ts, order = c(i, 0, 1)) 
> AIC(x.ma) 


[1] -3.53 


> AIC(x.ar) 


130 6 Stationary Models 


[1] -37.4 


> AIC(x.arma) 


[1] -42.3 
> x.arma 
Call: 


arima(x = x.ts, order = c(1, 0, 10) 


Coefficients: 
ari mai intercept 
0.892 0.532 2.960 
s.e. 0.076 0.202 0.244 


sigma^2 estimated as 0.0151: log likelihood = 25.1, aic = -42.3 


> acf(resid(x.arma)) 
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Fig. 6.4. The correlogram of residual series for the ARMA(1, 1) model fitted to the 
exchange rate data. 


6.6.3 Electricity production series 


Consider the Australian electricity production series introduced in §1.4.3. The 
data exhibit a clear positive trend and a regular seasonal cycle. Furthermore, 
the variance increases with time, which suggests a log-transformation may be 
appropriate (Fig. 1.5). A regression model is fitted to the logarithms of the 
original series in the code below. 
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www <- "http://www.massey.ac.nz/~pscowper/ts/cbe.dat" 

CBE <- read.table(www, header = T) 

Elec.ts <- ts(CBE[, 3], start = 1958, freq = 12) 

Time <- 1:length(Elec.ts) 

Imth <- cycle(Elec.ts) 

Elec.1m <- Im(log(Elec.ts) ^ Time + I(Time^2) + factor (Imth)) 
acf (resid(Elec.1m)) 
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'The correlogram of the residuals appears to cycle with a period of 12 months 
suggesting that the monthly indicator variables are not sufficient to account 
for the seasonality in the series (Fig. 6.5). In the next chapter, we find that this 
can be accounted for using a non-stationary model with a stochastic seasonal 
component. In the meantime, we note that the best fitting ARMA(p, q) model 
can be chosen using the smallest AIC either by trying a range of combinations 
of p and q in the arima function or using a for loop with upper bounds on 
p and q — taken as 2 in the code shown below. In each step of the for loop, 
the AIC of the fitted model is compared with the currently stored smallest 
value. If the model is found to be an improvement (i.e., has a smaller AIC 
value), then the new value and model are stored. To start with, best.aic is 
initialised to infinity (Inf). After the loop is complete, the best model can 
be found in best.order, and in this case the best model turns out to be an 
AR(2) model. 
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Fig. 6.5. Electricity production series: correlogram of the residual series of the fitted 
regression model. 


> best.order «- c(0, 0, 0) 
> best.aic <- Inf 
> for (i in 0:2) for (j in 0:2) { 
fit.aic «- AIC(arima(resid(Elec.lm), order = c(i, 0, 
D») 
if (fit.aic < best.aic) { 
best.order <- c(i, 0, j) 
best.arma <- arima(resid(Elec.1m), order = best.order) 
best.aic <- fit.aic 
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} 


> best.order 
[1] 200 
> acf(resid(best.arma)) 


The predict function can be used both to forecast future values from 
the fitted regression model and forecast the future errors associated with the 
regression model using the ARMA model fitted to the residuals from the 
regression. These two forecasts can then be summed to give a forecasted value 
of the logarithm for electricity production, which would then need to be anti- 
logged and perhaps adjusted using a bias correction factor. As predict is 
a generic R function, it works in different ways for different input objects 
and classes. For a fitted regression model of class 1m, the predict function 
requires the new set of data to be in the form of a data frame (object class 
data.frame). For a fitted ARMA model of class arima, the predict function 
requires just the number of time steps ahead for the desired forecast. In the 
latter case, predict produces an object that has both the predicted values and 
their standard errors, which can be extracted using pred and se, respectively. 
In the code below, the electricity production for each month of the next three 
years is predicted. 


> new.time <- seq(length(Elec.ts), length = 36) 

> new.data <- data.frame(Time = new.time, Imth = rep(1:12, 
3)) 

> predict.lm <- predict(Elec.lm, new.data) 

> predict.arma <- predict(best.arma, n.ahead = 36) 

> elec.pred <- ts(exp(predict.lm + predict.arma$pred), start = 1991, 
freq = 12) 

> ts.plot(cbind(Elec.ts, elec.pred), lty = 1:2) 
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Fig. 6.6. Electricity production series: correlogram of the residual series of the 
best-fitting ARMA model. 
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The plot of the forecasted values suggests that the predicted values for 
winter may be underestimated by the fitted model (Fig. 6.7), which may be 
due to the remaining seasonal autocorrelation in the residuals (see Fig. 6.6). 
This problem will be addressed in the next chapter. 
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Fig. 6.7. Electricity production series: observed (solid line) and forecasted values 
(dotted line). The forecasted values are not likely to be accurate because of the 
seasonal autocorrelation present in the residuals for the fitted model. 


6.6.4 Wave tank data 


The data in the file wave.dat are the surface height of water (mm), relative 
to the still water level, measured using a capacitance probe positioned at the 
centre of a wave tank. The continuous voltage signal from this capacitance 
probe was sampled every 0.1 second over a 39.6-second period. The objective 
is to fit a suitable ARMA(p, q) model that can be used to generate a realistic 
wave input to a mathematical model for an ocean-going tugboat in a computer 
simulation. The results of the computer simulation will be compared with tests 
using a physical model of the tugboat in the wave tank. 

The pacf suggests that p should be at least 2 (Fig. 6.8). The best-fitting 
ARMA(p, q) model, based on a minimum variance of residuals, was obtained 
with both p and q equal to 4. The acf and pacf of the residuals from this model 
are consistent with the residuals being a realisation of white noise (Fig. 6.9). 


www <- "http://www.massey.ac.nz/~pscowper/ts/wave.dat" 
wave.dat <- read.table(www, header = T) 

attach (wave.dat) 

layout (1:3) 

plot (as.ts(waveht), ylab = 'Wave height') 

acf (waveht) 

pacf (waveht) 

wave.arma <- arima(waveht, order = c(4,0,4)) 

acf (wave.arma$res[-(1:4)]) 

pacf (wave.arma$res[-(1:4)]) 
hist(wave.arma$res[-(1:4)], xlab-'height / mm', main-'') 
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Fig. 6.9. Residuals after fitting an ARMA(4, 4) model to wave heights: acf, pacf, 
and histogram. 
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6.7 Summary of R commands 


arima.sim simulates data from an ARMA (or ARIMA) process 
arima fits an ARMA (or ARIMA) model to data 

seq generates a sequence 

expression used to plot maths symbol 


6.8 Exercises 


1. Using the relation Cov(*5 x, > yz) = 35», Cov(ae, y+) (Equation (2.15)) 
for time series (z;) and {y+}, prove Equation (6.3). 


2. The series {w;} is white noise with zero mean and variance o2. For the 
following moving average models, find the autocorrelation function and 
determine whether they are invertible. In addition, simulate 100 observa- 
tions for each model in R, compare the time plots of the simulated series, 
and comment on how the two series might be distinguished. 

a) Lt = We + twi 
b) Le = We T 2wr-1 


3. Write the following models in ARMA(p, q) notation and determine whether 
they are stationary and/or invertible (w, is white noise). In each case, 
check for parameter redundancy and ensure that the ARMA(p, q) nota- 
tion is expressed in the simplest form. 

a) £i = 34 51— itio 4 w + SUL] 


b) xp = 224-1 — zi-2 + wr 
3 1 1 1 
€) T; = 521-1 — 544-2 + We — 51 + gW 
LES 1 1 
d) Tt = gUt-1 — ZUt—2 T $53Ut — Wt-1 
En 1 3 
e) LE = 10 *t—1 10 *t—2 t Wt 9 Wt-1 
3 1 1 1 
f) z; — $1211.1 — 31152 + We — 33W + gWr 
4. a) Fit a suitable regression model to the air passenger series. Comment 
on the correlogram of the residuals from the fitted regression model. 


b) Fit an ARMA(p, q) model for values of p and q no greater than 2 
to the residual series of the fitted regression model. Choose the best 
fitting model based on the AIC and comment on its correlogram. 

c) Forecast the number of passengers travelling on the airline in 1961. 


5. a) Write an R function that calculates the autocorrelation function (Equa- 
tion (6.13)) for an ARMA(1, 1) process. Your function should take 
parameters representing o and f for the AR and MA components. 
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b) 


c) 
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Plot the autocorrelation function above for the case with a = 0.7 and 
B — —0.5 for lags 0 to 20. 

Simulate n = 100 values of the ARMA(1, 1) model with a = 0.7 
and 8 = —0.5, and compare the sample correlogram to the theoretical 
correlogram plotted in part (b). Repeat for n — 1000. 


6. Let (zx; : t = 1,...,n} be a stationary time series with E(x) = p, 
Var(z,;) = o°, and Cor(z;, z;,4) = py. Using Equation (5.5) from Chapter 


5: 
a) 
b) 
c) 


Calculate Var(z) when {x+} is the MA(1) process x, = wi + 3wi1. 


Calculate Var(z) when {z+} is the MA(1) process x; = w; — twsa. 


Compare each of the above with the variance of the sample mean 
obtained for the white noise case pp = 0 (k > 0). Of the three mod- 
els, which would have the most accurate estimate of u based on the 
variances of their sample means? 


A simulated example that extracts the variance of the sample mean 
for 100 Gaussian white noise series each of length 20 is given by 

> set.seed(1) 

> m <- rep(0, 100) 

> for (i in 1:100) m[i] <- mean(rnorm(20)) 

> var(m) 

[1] 0.0539 
For each of the two MA(1) processes, write R code that extracts the 
variance of the sample mean of 100 realisations of length 20. Compare 
them with the variances calculated in parts (a) and (b). 


7. If the sample autocorrelation function of a time series appears to cut off 
after lag q (i.e., autocorrelations at lags higher than q are not significantly 
different from 0 and do not follow any clear patterns), then an MA(q) 
model might be suitable. An AR(p) model is indicated when the partial 
autocorrelation function cuts off after lag p. If there are no convincing 
cutoff points for either function, an ARMA model may provide the best 


fit. 


Plot the autocorrelation and partial autocorrelation functions for the 


simulated ARMA(1, 1) series given in §6.6.1. Using the AIC, choose a 
best-fitting AR. model and a best-fitting MA model. Which best-fitting 
model (AR or MA) has the smallest number of parameters? Compare this 
model with the fitted ARMA(1, 1) model of 86.6.1, and comment. 
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Non-stationary Models 


7.1 Purpose 


As we have discovered in the previous chapters, many time series are non- 
stationary because of seasonal effects or trends. In particular, random walks, 
which characterise many types of series, are non-stationary but can be trans- 
formed to a stationary series by first-order differencing (§4.4). In this chap- 
ter we first extend the random walk model to include autoregressive and 
moving average terms. As the differenced series needs to be aggregated (or 
‘integrated’) to recover the original series, the underlying stochastic process 
is called autoregressive integrated moving average, which is abbreviated to 
ARIMA. 

The ARIMA process can be extended to include seasonal terms, giving a 
non-stationary seasonal ARIMA (SARIMA) process. Seasonal ARIMA models 
are powerful tools in the analysis of time series as they are capable of modelling 
a very wide range of series. Much of the methodology was pioneered by Box 
and Jenkins in the 1970’s. 

Series may also be non-stationary because the variance is serially corre- 
lated (technically known as conditionally heteroskedastic), which usually re- 
sults in periods of volatility, where there is a clear change in variance. This 
is common in financial series, but may also occur in other series such as cli- 
mate records. One approach to modelling series of this nature is to use an 
autoregressive model for the variance, i.e. an autoregressive conditional het- 
eroskedastic (ARCH) model. We consider this approach, along with the gen- 
eralised ARCH (GARCH) model in the later part of the chapter. 


7.2 Non-seasonal ARIMA models 


7.2.1 Differencing and the electricity series 


Differencing a series {x;} can remove trends, whether these trends are stochas- 
tic, as in a random walk, or deterministic, as in the case of a linear trend. In 
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the case of a random walk, 7, = z,.; + w, the first-order differenced se- 
ries is white noise {w+} (i.e., Vx; = zx, — 44-1 = wz) and so is stationary. 
In contrast, if x, = a+ bt + us, a linear trend with white noise errors, then 
Va = zí4—34—4 = b4-wi— w1, which is a stationary moving average process 
rather than white noise. Notice that the consequence of differencing a linear 
trend with white noise is an MA(1) process, whereas subtraction of the trend, 
a+ bt, would give white noise. This raises an issue of whether or not it is sen- 
sible to use differencing to remove a deterministic trend. The arima function 
in R does not allow the fitted differenced models to include a constant. If you 
wish to fit a differenced model to a deterministic trend using R you need to 
difference, then mean adjust the differenced series to have a mean of 0, and 
then fit an ARMA model to the adjusted differenced series using arima with 
include.mean set to FALSE and d = 0. 

A corresponding issue arises with simulations from an ARIMA model. 
Suppose x; = a+ bt + w: so Vay = ye = b + wi — wi 4. It follows directly 
from the definitions that the inverse of y; = Vaz is zy = zo + ae yi. If an 
MA(1) model is fitted to the differenced time series, {y+}, the coefficient of 
w1 is unlikely to be identified as precisely —1. It follows that the simulated 
(zi) will have increasing variance (Exercise 3) about a straight line. 

We can take first-order differences in R using the difference function diff. 
For example, with the Australian electricity production series, the code below 
plots the data and first-order differences of the natural logarithm of the series. 
Note that in the 1ayout command below the first figure is allocated two 1s 
and is therefore plotted over half (i.e., the first two fourths) of the frame. 


www <- "http://www.massey.ac.nz/^pscowper/ts/cbe.dat" 
CBE «- read.table(www, he - T) 

Elec.ts <- ts(CBE[, 3], start = 1958, freq = 12) 
layout(c(1, 1, 2, 3)) 

plot(Elec.ts) 

plot(diff(Elec.ts)) 

plot (diff (log(Elec.ts))) 
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The increasing trend is no longer apparent in the plots of the differenced series 
(Fig. 7.1). 


7.2.2 Integrated model 


A series {x,} is integrated of order d, denoted as I(d), if the dth difference of 
(zi) is white noise {w}; i.e., Vr; = w. Since V4 = (1 — B)¢, where B is 
the backward shift operator, a series {x+} is integrated of order d if 


(1 — B)'z, = w (7.1) 


The random walk is the special case I(1). The diff command from the pre- 
vious section can be used to obtain higher-order differencing either by re- 
peated application or setting the parameter d to the required values; e.g., 
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(c) 


Fig. 7.1. (a) Plot of Australian electricity production series; (b) plot of the first- 
order differenced series; (c) plot of the first-order differenced log-transformed series. 


diff(diff(x)) and diff(x, d-2) would both produce second-order differ- 
enced series of x. Second-order differencing may sometimes successfully reduce 
a series with an underlying curve trend to white noise. A further parameter 
(lag) can be used to set the lag of the differencing. By default, lag is set to 
unity, but other values can be useful for removing additive seasonal effects. 
For example, diff(x, lag=12) will remove both a linear trend and additive 
seasonal effects in a monthly series. 


7.2.3 Definition and examples 


A time series {x;} follows an ARIMA (p, d, q) process if the dth differences of 
the {z+} series are an ARMA(p, q) process. If we introduce y; = (1 — B)?z;, 
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then 06,(B)y, = ¢q(B)w;. We can now substitute for y, to obtain the more 
succinct form for an ARIMA(p, d, q) process as 


0,(B)(1 — B)"z; = (4(B)w; (7.2) 


where 0, and $, are polynomials of orders p and q, respectively. Some examples 
of ARIMA models are: 


(a) £i = 14-1 - wi + Bw,..,, where 8 is a model parameter. To see which model 
this represents, collect together like terms, factorise them, and express 
them in terms of the backward shift operator (1 — B)z, = (1+ 8B)uw;. 
Comparing this with Equation (7.2), we can see that {z+} is ARIMA(O, 1, 
1), which is sometimes called an integrated moving average model, denoted 
as IMA(1, 1). In general, ARIMA(0, d, q) = IMA(d, q). 

(b) x, = ac, 14-24 1— 0o; 34-w;, where a is a model parameter. Rearranging 
and factorising gives (1 — oB)(1 — B)z; = wi, which is ARIMA(1, 1, 0), 
also known as an integrated autoregressive process and denoted as ARI(1, 
1). In general, ARI(p, d) = ARIMA (p, d, 0). 


7.2.4 Simulation and fitting 


An ARIMA(p, d, q) process can be fitted to data using the R function arima 
with the parameter order set to c(p, d, q). An ARIMA(p, d, q) process can 
be simulated in R by writing appropriate code. For example, in the code below, 
data for the ARIMA(1, 1, 1) model x; = 0.52; 14-21 1—0.5z1 3--w1 4-0.3w, 4 
are simulated and the model fitted to the simulated series to recover the 
parameter estimates. 


> set.seed(1) 

> x <- w <- rnorm(1000) 

> for (i in 3:1000) x[i] <- 0.5 * x[i - 1] + x[i- 1] - 0.5 * 
x[i - 2] + w[i] + 0.3 * w[i - 1] 

> arima(x, order = c(1, 1, 10) 


Call: 
arima(x = x, order = c(i, 1, 1)) 


Coefficients: 
ari maí 
0.423 0.331 


s.e. 0.043 0.045 


sigma^2 estimated as 1.07: log likelihood = -1450, aic = 2906 


Writing your own code has the advantage in that it helps to ensure that you 
understand the model. However, an ARIMA simulation can be carried out 
using the inbuilt R function arima.sim, which has the parameters model and 
n to specify the model and the simulation length, respectively. 
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> x <- arima.sim(model = list(order = c(i, 1, 1), ar = 0.5, 
ma = 0.3), n = 1000) 
> arima(x, order = c(1, 1, 1)) 


Call: 
arima(x = x, order = c(1, 1, 1)) 


Coefficients: 
ari mat 
0.557 0.250 


s.e. 0.037 0.044 


sigma^2 estimated as 1.08: log likelihood = -1457, aic = 2921 


7.2.5 IMA(1, 1) model fitted to the beer production series 


The Australian beer production series is in the second column of the dataframe 
CBE in 87.2.1. The beer data is dominated by a trend of increasing beer pro- 
duction over the period, so a simple integrated model IMA(1, 1) is fitted to 
allow for this trend and a carryover of production from the previous month. 
The IMA(1, 1) model is often appropriate because it represents a linear trend 
with white noise added. The residuals are analysed using the correlogram (Fig. 
7.2), which has peaks at yearly cycles and suggests that a seasonal term is 
required. 

> Beer.ts <- ts(CBE[, 2], start = 1958, freq = 12) 

> Beer.ima <- arima(Beer.ts, order = c(0, 1, 1)) 

> Beer.ima 


Call: 
arima(x = Beer.ts, order = c(0, 1, 1)) 


Coefficients: 
mat 

-0.333 

s.e. 0.056 


sigma^2 estimated as 360: log likelihood = -1723, aic = 3451 
> acf(resid(Beer. ima) ) 


From the output above the fitted model is x; = x;..14-w,; —0.33w,..,. Forecasts 
can be obtained using this model, with t set to the value required for the 
forecast. Forecasts can also be obtained using the predict function in R with 
the parameter n. ahead set to the number of values in the future. For example, 
the production for the next year in the record is obtained using predict and 
the total annual production for 1991 obtained by summation: 

> Beer.1991 «- predict(Beer.ima, n.ahead = 12) 


> sum(Beer.1991$pred) 
[1] 2365 
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Fig. 7.2. Australian beer series: correlogram of the residuals of the fitted IMA(1, 
1) model 


7.3 Seasonal ARIMA models 


7.3.1 Definition 


A seasonal ARIMA model uses differencing at a lag equal to the number of 
seasons (s) to remove additive seasonal effects. As with lag 1 differencing to 
remove a trend, the lag s differencing introduces a moving average term. The 
seasonal ARIMA model includes autoregressive and moving average terms at 
lag s. The seasonal ARIMA (p, d, q)( P, D, Q), model can be most succinctly 
expressed using the backward shift operator 


Op(B*)6,(B)(1 — B*)?(1 — B) z; = $9(B*)o,(B)w. (7.3) 


where Op, 0p, PQ, and $4 are polynomials of orders P, p, Q, and q, respec- 
tively. In general, the model is non-stationary, although if D — d — 0 and the 
roots of the characteristic equation (polynomial terms on the left-hand side of 
Equation (7.3)) all exceed unity in absolute value, the resulting model would 
be stationary. Some examples of seasonal ARIMA models are: 


(a) A simple AR model with a seasonal period of 12 units, denoted as 
ARIMA(0, 0, 0)(1, 0, O)12, is z; = az; 12 + wy. Such a model would 
be appropriate for monthly data when only the value in the month of the 
previous year influences the current monthly value. The model is station- 
ary when |a 1/?2| > 1. 


(b) It is common to find series with stochastic trends that nevertheless 
have seasonal influences. The model in (a) above could be extended to 
XQ = G-1 + OGXa212 — aX 13 + wi. Rearranging and factorising gives 


7.3 Seasonal ARIMA models 143 


(1 — aB!”)(1 — B)z, = w, or €4(B?)(1— B)z, = w, which, on com- 
paring with Equation (7.3), is ARIMA(O, 1, 0)(1, 0, 0)12. Note that this 
model could also be written Vaz = &V zt—12 + w+, which emphasises that 
the change at time t depends on the change at the same time (i.e., month) 
of the previous year. The model is non-stationary since the polynomial on 
the left-hand side contains the term (1 — B), which implies that there 
exists a unit root B — 1. 


A simple quarterly seasonal moving average model is x; = (1 — GB*)w; = 
Ww — Du4. This is stationary and only suitable for data without a trend. 
If the data also contain a stochastic trend, the model could be extended 
to include first-order differences, 7, = z,.4 + w, — Bwi—4, which is an 
ARIMA(O0, 1, 0)(0, 0, 1)4 process. Alternatively, if the seasonal terms con- 
tain a stochastic trend, differencing can be applied at the seasonal period 
to give z; = 4-4 + wi — Bwr_a, which is ARIMA(0, 0, 0)(0, 1, 1)4. 


Lom 
e 
ML 


You should be aware that differencing at lag s will remove a linear trend, 
so there is a choice whether or not to include lag 1 differencing. If lag 1 
differencing is included, when a linear trend is appropriate, it will introduce 
moving average terms into a white noise series. As an example, consider a time 
series of period 4 that is the sum of a linear trend, four additive seasonals, 
and white noise: 
zx, — a d bt sp] + wi 


where [t] is the remainder after division of t by 4, so sjy = sp; 4j. First, consider 
first-order differencing at lag 4 only. Then, 


(1 as B^), = Tt — Tt—4 


— a. bt — (a+ b(t — 4)) + sp — spa] + We — Wi-a 


= 4b + Wt — Wt—4 


Formally, the model can be expressed as ARIMA(0, 0, 0)(0, 1, 1)4 with a 
constant term 4b. Now suppose we apply first-order differencing at lag 1 before 
differencing at lag 4. Then, 


(1— B^(1 — B)z, = (1 B^)(b + Ste] — Sp] + Ut w1) 


= We — W1 — We-4 + Wis 


which is a ARIMA(0, 1, 1)(0, 1, 1)4 model with no constant term. 


7.3.2 Fitting procedure 


Seasonal ARIMA models can potentially have a large number of parameters 
and combinations of terms. Therefore, it is appropriate to try out a wide 
range of models when fitting to data and to choose a best-fitting model using 
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an appropriate criterion such as the AIC. Once a best-fitting model has been 
found, the correlogram of the residuals should be verified as white noise. Some 
confidence in the best-fitting model can be gained by deliberately overfitting 
the model by including further parameters and observing an increase in the 
AIC. 

In R, this approach to fitting a range of seasonal ARIMA models is straight- 
forward, since the fitting criteria can be called by nesting functions and the 
‘up arrow’ on the keyboard used to recall the last command, which can then 
be edited to try a new model. Any obvious terms, such as a differencing term 
if there is a trend, should be included and retained in the model to reduce 
the number of comparisons. The model can be fitted with the arima function, 
which requires an additional parameter seasonal to specify the seasonal com- 
ponents. In the example below, we fit two models with first-order terms to 
the logarithm of the electricity production series. The first uses autoregressive 
terms and the second uses moving average terms. The parameter d = 1 is re- 
tained in both the models since we found in §7.2.1 that first-order differencing 
successfully removed the trend in the series. The seasonal ARI model provides 
the better fit since it has the smallest AIC. 


> AIC (arima(log(Elec.ts), order = c(1,1,0), 
seas = list(order 


c(1,0,0), 12))) 

[1] -1765 

> AIC (arima(log(Elec.ts), order = c(0,1,1), 
seas = list(order 


c(0,0,1), 12))) 
[1] -1362 


It is straightforward to check a range of models by a trial-and-error approach 
involving just editing a command on each trial to see if an improvement in the 
AIC occurs. Alternatively, we could write a simple function that fits a range of 
ARIMA models and selects the best-fitting model. This approach works better 
when the conditional sum of squares method CSS is selected in the arima 
function, as the algorithm is more robust. To avoid over parametrisation, the 
consistent Akaike Information Criteria (CAIC; see Bozdogan, 1987) can be 
used in model selection. An example program follows. 


get.best.arima <- function(x.ts, maxord = c(1,1,1,1,1,1)) 
{ 
best.aic <- 1e8 
n <- length(x.ts) 
for (p in O:maxord[1]) for(d in O:maxord[2]) for(q in 0:maxord[3]) 
for (P in O:maxord[4]) for(D in O:maxord[5]) for(Q in O:maxord[6]) 
1 
fit <- arima(x.ts, order = c(p,d,q), 
seas = list(order = c(P,D,Q), 
frequency(x.ts)), method = "CSS") 
fit.aic <- -2 * fit$loglik + (log(m) + 1) * length(fit$coef) 
if (fit.aic < best.aic) 


{ 
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best.aic <- fit.aic 
best.fit <- fit 
best.model <- c(p,d,q,P,D,Q) 
} 
} 
list(best.aic, best.fit, best.model) 
} 


> best.arima.elec <- get.best.arima( log(Elec.ts), 
maxord = c(2,2,2,2,2,2)) 
> best.fit.elec <- best.arima.elec[[2]] 
> acf( resid(best.fit.elec) ) 
> best.arima.elec [[3]] 


[11011202 


> ts.plot( cbind( window(Elec.ts,start = 1981), 
exp(predict(best.fit.elec,12)$pred) ), lty = 1:2) 


From the code above, we see the best-fitting model using terms up to second 
order is ARIMA(O, 1, 1)(2, 0, 2)12. Although higher-order terms could be tried 
by increasing the values in maxord, this would seem unnecessary since the 
residuals are approximately white noise (Fig. 7.3b). For the predicted values 
(Fig. 7.3a), a biased correction factor could be used, although this would seem 
unnecessary given that the residual standard deviation is small compared with 
the predictions. 


7.4 ARCH models 
7.4.1 S&P500 series 


Standard and Poors (of the McGraw-Hill companies) publishes a range of 
financial indices and credit ratings. Consider the following time plot and cor- 
relogram of the daily returns of the S&P500 Index! (from January 2, 1990 to 
December 31, 1999), available in the MASS library within R. 

> library (MASS) 

> data(SP500) 

> plot(SP500, type = '1') 

> acf (SP500) 

The time plot of the returns is shown in Figure 7.4(a), and at first glance 

the series appears to be a realisation of a stationary process. However, on 


1 
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Fig. 7.3. Electricity production series: (a) time plot for last 10 years, with added 
predicted values (dotted); (b) correlogram of the residuals of the best-fitting seasonal 
ARIMA model. 


closer inspection, it seems that the variance is smallest in the middle third of 
the series and greatest in the last third. The series exhibits periods of increased 
variability, Sometimes called volatility in the financial literature, although it 
does not increase in a regular way. When a variance is not constant in time 
but changes in a regular way, as in the airline and electricity data (where the 
variance increased with the trend), the series is called heteroskedastic. If a 
series exhibits periods of increased variance, so the variance is correlated in 
time (as observed in the S&P500 data), 


If a correlogram appears to be white noise (e.g., 
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Fig. 7.4. Standard and Poors returns of the S&P500 Index: (a) time plot; (b) 
correlogram. 


provided the series is adjusted to have a mean of zero). 


he correlogram of the 


> acf((SP500 - mean(SP500))^2) 


7.4.2 Modelling volatility: Definition of the ARCH model 


a B approach to this is to use an autoregressive model 


for the variance process. This leads to the following definition. A series (e) 
is first-order autoregressive conditional heteroskedastic, denoted ARCH(1), if 
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Fig. 7.5. Returns of the Standard and Poors S&P500 Index: correlogram of the 
squared mean-adjusted values. 


€; = wif ao + 01€2 4 (7.4) 


To see how this introduces volatility, square Equation (7.4) to calculate 
the variance 
Var (&) = E (€) 
= E(w?) E (ao + 01€? 1) 
=E (ao + 03€; 1) 
= ag + ay Var (&—1) (7.5) 


has unit variance and {e+} has zero mean. 


model fitting, 


only be applied to a 


such as might be obtained after fitting 
a satisfactory SARIMA model. 


7.4.3 Extensions and GARCH models 


The first-order ARCH model can be extended to a pth-order process by in- 
cluding higher lags. 


7.4 ARCH models 149 


(7.6) 


€t = Wt 


p 
` ` 2 

Qo + QpEt—i 
i=1 


where {w;} is again white noise with zero mean and unit variance. 

A further extension, widely used in financial applications, is the generalised 
ARCH model, denoted GARCH(q, p), which has the ARCH(p) model as the 
special case GARCH(0, p). A series {e} is GARCH(q, p) if 


€t = wi he (7.7) 


where 


p q 
he 2 ao M oid i Y Bias; (7.8) 
i=1 j=l 


and a; and 0; (i EA ...,qQ) are model parameters. 


n example now follows. 


7.4.4 Simulation and fitted GARCH model 


In the following code data are simulated for the GARCH(1, 1) model a, = 
wiy hi, where hy = ao + oa: + Giht_1 with o4 + 81 < 1 to ensure stability; 
e.g., see Enders (1995). The simulated series are placed in the vector a and 
the correlograms plotted (Fig. 7.6). 


set.seed(1) 
alphaO <- 0.1 
alphal <- 0.4 
beta1 <- 0.2 
w <- rnorm(10000) 
a <- rep(0, 10000) 
h <- rep(0, 10000) 
for (i in 2:10000) { 
h[i] <- alphaO + alphal * (a[i - 1]^2) + betai * h[i - 
1] 
a[i] <- w[i] * sqrt(h[i]) 


V VV VN MM OM 


} 
> acf (a) 
> acf(a^2) 


In the following example, 


The default is GARCH(1, 
1), which often provides an adequate model, but higher-order models can be 
specified with the parameter order-c(p,qg) for some choice of p and q. 
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Fig. 7.6. Correlograms for GARCH series: (a) simulated series; (b) squared values 
of simulated series. 


> library(tseries) 


> a.garch <- garch(a, grad = "numerical", trace = FALSE) 
> confint(a.garch) 


2.5 4 97.5 % 
a0 0.0882 0.109 
al 0.3308 0.402 
b1 0.1928 0.295 


7.4.5 Fit to S&P500 series 
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If the GARCH model is suitable the residual series should appear to be a 
realisation of white noise with zero mean and unit variance. In the case of a 
GARCH(1, 1) model, 


h, = Qo + âi + DP 


with h; =0 fort = 2,...,n.? The calculations are performed by the function 
garch. The first value in the residual series is not available (NA), so we remove 
the first value using [71] and the correlograms are then found for the resultant 
residual and squared residual series: 


Sp.garch <- garch(SP500, trace = F) 
sp.res <- sp.garch$res[-1] 

acf (sp.res) 

acf(sp.res^2) 


VV MM 


Both correlograms suggest that the residuals of the fitted GARCH model be- 
have like white noise, indicating a satisfactory fit has been obtained (Fig. 7.7). 
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Fig. 7.7. GARCH model fitted to mean-adjusted S&P500 returns: (a) correlogram 
of the residuals; (b) correlogram of the squared residuals. 


? Notice that a residual for time t — 1 cannot be calculated from this formula. 
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7.4.6 Volatility in climate series 


Recently there have been studies on volatility in climate series (e.g., Romilly, 
2005). Temperature data (1850-2007; see Brohan et al. 2006) for the southern 
hemisphere were extracted from the database maintained by the University 
of East Anglia Climatic Research Unit and edited into a form convenient for 
reading into R. In the following code, the series are read in, plotted (Fig. 7.8), 
and a best-fitting seasonal ARIMA model obtained using the get .best .arima 
function given in §7.3.2. Confidence intervals for the parameters were then 
evaluated (the transpose t() was taken to provide these in rows instead of 
columns). 
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Fig. 7.8. The southern hemisphere temperature series. 


> stemp <- scan("http://www.massey.ac.nz/^pscowper/ts/stemp.dat") 
> stemp.ts <- ts(stemp, start = 1850, freq = 12) 
> plot(stemp.ts) 


> stemp.best <- get.best.arima(stemp.ts, maxord = rep(2,6)) 
> stemp.best[[31] 


[1] 11220 1 


> stemp.arima <- arima(stemp.ts, order = c(1,1,2), 
seas list(order = c(2,0,1), 12)) 
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> t( confint(stemp.arima) ) 


ari mai ma2 sari sar2 smal 
2.5% 0.832 -1.45 0.326 0.858 -0.0250 -0.97 
97.5 % 0.913 -1.31 0.453 1.004 0.0741 -0.85 


The second seasonal AR component is not significantly different from zero, 
and therefore the model is refitted leaving this component out: 


> stemp.arima <- arima(stemp.ts, order = c(1,1,2), 
seas = list(order = c(1,0,1), 12)) 


> t( confint(stemp.arima) ) 


ari mat ma2 sari smal 
2.5% 0.83 -1.45 0.324 0.924 -0.969 
97.5 % 0.91 -1.31 0.451 0.996 -0.868 


To check for goodness-of-fit, the correlogram of residuals from the ARIMA 
model is plotted (Fig. 7.9a). In addition, to investigate volatility, the correlo- 
gram of the squared residuals is found (Fig. 7.9b). 
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Fig. 7.9. Seasonal ARIMA model fitted to the temperature series: (a) correlogram 
of the residuals; (b) correlogram of the squared residuals. 


> stemp.res <- resid(stemp.arima) 
> layout(1:2) 
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> acf(stemp.res) 
> acf(stemp.res^2) 


There is clear evidence of volatility since the squared residuals are corre- 
lated at most lags (Fig. 7.9b). Hence, a GARCH model is fitted to the residual 
series: 


> stemp.garch <- garch(stemp.res, trace = F) 
> t(confint (stemp.garch) ) 


ad al bi 
2.5 4 1.06e-05 0.0330 0.925 
97.5 % 1.49e-04 0.0653 0.963 


> stemp.garch.res <- resid(stemp.garch) [-1] 
> acf(stemp.garch.res) 
> acf(stemp.garch.res^2) 


Based on the output above, we can see that the coefficients of the fitted 
GARCH model are all statistically significant, since zero does not fall in any of 
the confidence intervals. Furthermore, the correlogram of the residuals shows 
no obvious patterns or significant values (Fig. 7.10). Hence, a satisfactory fit 
has been obtained. 
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Fig. 7.10. GARCH model fitted to the residuals of the seasonal ARIMA model 
of the temperature series: (a) correlogram of the residuals; (b) correlogram of the 
squared residuals. 
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7.4.7 GARCH in forecasts and simulations 


If a GARCH model is fitted to the residual errors of a fitted time series 
model, it will not influence the average prediction at some point in time since 
the mean of the residual errors is zero. Thus, single-point forecasts from a 
fitted time series model remain unchanged when GARCH models are fitted 
to the residuals. However, a fitted GARCH model will affect the variance of 
simulated predicted values and thus result in periods of changing variance or 
volatility in simulated series. 

The main application of GARCH models is for simulation studies, espe- 
cially in finance, insurance, teletraffic, and climatology. In all these applica- 
tions, the periods of high variability tend to lead to untoward events, and it is 
essential to model them in a realistic manner. Typical R code for simulation 
was given in §7.4.4. 


7.5 Summary of R commands 


garch fits a GARCH (or ARCH) model to data 


7.6 Exercises 


In each of the following, {w+} is white noise with zero mean. 


1. Identify each of the following as specific ARIMA models and state whether 
or not they are stationary. 


a) Zt = 24-1 — 0.25219 + w+ 0.5w,1 
b) zt = 224-1 — 4-2 + We 


c) zi = 0.52,1 + 0.5249 + wz — 0.5w4 1 + 0.25004. 2 


2. Identify the following as certain multiplicative seasonal ARIMA models 
and find out whether they are invertible and stationary. 


a) zi = 0.5211 + 4a — 0.52158 + wą — 0.39141 


b) Ze = Zt—1 + 24—12 — 4-13 + we — 0.5w44 — 0.5w,;.12 + 0.25w1.13 


3. Suppose rz, = a+ bt + w. Define y; = Vaz. 


a) Show that x; = zo + p y; and identify xo. 

b) Now suppose an MA(1) model is fitted to {y+} and the fitted model is 
yr = b+ wi + Bwi—1. Show that a simulated {x+} will have increasing 
variance about the line a + bt unless f is precisely —1. 
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The number of overseas visitors to New Zealand is recorded for each month 
over the period 1977 to 1995 in the file osvisit.dat on the book website 
(http: //www.massey.ac.nz/~pscowper /ts/osvisit.dat). Download the file 
into R and carry out the following analysis. Your solution should include 
any R commands, plots, and comments. Let x, be the number of overseas 
visitors in time period t (in months) and z% = In(z;). 


a) Comment on the main features in the correlogram for {z+}. 

b) Fit an ARIMA(1, 1, 0) model to (z;) giving the estimated AR pa- 
rameter and the standard deviation of the residuals. Comment on the 
correlogram of the residuals of this fitted ARIMA model. 

c) Fit a seasonal ARIMA(1, 1, 0)(0, 1, 0)12 model to {z,} and plot the 
correlogram of the residuals of this model. Has seasonal differencing 
removed the seasonal effect? Comment. 

d) Choose the best-fitting Seasonal ARIMA model from the following: 

ARIMA(1, 1, 0)(, 1, 0)12, ARIMA(O, 1, 1)(0, 1, 1)12, ARIMA(I, 1, 

0)(0, 1, 1)ı2, ARIMA(O, 1, 1)(1, 1, 0);2, ARIMA(I1, 1, 1)(1, 1, 1)15, 

ARIMA(1, 1, 1)(1, 1, 0)12, ARIMA(1, 1, 1)(0, 1, 1)12. Base your choice 

on the AIC, and comment on the correlogram of the residuals of the 

best-fitting model. 

Express the best-fitting model in part (d) above in terms of z4, white 

noise components, and the backward shift operator (you will need 

to write this out by hand, but it is not necessary to expand all the 
factors). 

Test the residuals from the best-fitting seasonal ARIMA model for 

stationarity. 

Forecast the number of overseas visitors for each month in the next 

year (1996), and give the total number of visitors expected in 1996 

under the fitted model. [Hint: To get the forecasts, you will need to use 

the exponential function of the generated seasonal ARIMA forecasts 
and multiply these by a bias correction factor based on the mean 
square residual error.] 


oO 
— 


lm) 
Ww 
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. Use the get.best.arima function from 87.3.2 to obtain a best-fitting 


ARIMA(p, d, q)(P, D, Q)ı2 for all p, d, q, P, D, Q < 2 to the 
logarithm of the Australian chocolate production series (in the file at 
http://www.massey.ac.nz/~pscowper/ts/cbe.dat). Check that the correl- 
ogram of the residuals for the best-fitting model is representative of white 
noise. Check the correlogram of the squared residuals. Comment on the 
results. 


. This question uses the data in stockmarket.dat on the book website 


http://www.massey.ac.nz/~pscowper/ts/, which contains stock market 
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data for seven cities for the period January 6, 1986 to December 31, 1997. 
Download the data into R and put the data into a variable x. The first 
three rows should be: 


> x[1:3,] 


1 
2 
3 


a) 


= 


e 
— 


B 


[o] 
~ 


Amsterdam Frankfurt London HongKong Japan Singapore NewYork 


275.76 1425.56 1424.1 1796.59 13053.8 233.63 210.65 
275.43 1428.54 1415.2 1815.53 12991.2 237.37 213.80 
278.76 1474.24 1404.2 1826.84 13056.4 240.99 207.97 


Plot the Amsterdam series and the first-order differences of the series. 
Comment on the plots. 

Fit the following models to the Amsterdam series, and select the best 
fitting model: ARIMA(0, 1, 0); ARIMA(1, 1, 0), ARIMA(O, 1, 1), 
ARIMA(I, 1, 1). 

Produce the correlogram of the residuals of the best-fitting model and 
the correlogram of the squared residuals. Comment. 

Fit the following GARCH models to the residuals, and select the 
best-fitting model: GARCH(0, 1), GARCH(1, 0), GARCH(1, 1), and 
GARCH(0, 2). Give the estimated parameters of the best-fitting 
model. 

Plot the correlogram of the residuals from the best fitting GARCH 
model. Plot the correlogram of the squared residuals from the best 
fitting GARCH model, and comment on the plot. 


. Predict the monthly temperatures for 2008 using the model fitted to the 
climate series in 87.4.6, and add these predicted values to a time plot of 
the temperature series from 1990. Give an upper bound for the predicted 
values based on a 9596 confidence level. Simulate ten possible future tem- 
perature scenarios for 2008. This will involve generating GARCH errors 
and adding these to the predicted values from the fitted seasonal ARIMA 
model. 
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Long-Memory Processes 


8.1 Purpose 


Some time series exhibit marked correlations at high lags, and they are re- 
ferred to as long-memory processes. Long-memory is a feature of many geo- 
physical time series. Flows in the Nile River have correlations at high lags, 
and Hurst (1951) demonstrated that this affected the optimal design capacity 
of a dam. Mudelsee (2007) shows that long-memory is a hydrological prop- 
erty that can lead to prolonged drought or temporal clustering of extreme 
floods. At a rather different scale, Leland et al. (1993) found that Ethernet 
local area network (LAN) traffic appears to be statistically self-similar and a 
long-memory process. They showed that the nature of congestion produced by 
self-similar traffic differs drastically from that predicted by the traffic models 
used at that time. Mandelbrot and co-workers investigated the relationship 
between self-similarity and long term memory and played a leading role in 
establishing fractal geometry as a subject of study. 


8.2 Fractional differencing 


Beran (1994) describes the qualitative features of a typical sample path (real- 
isation) from a long-memory process. There are relatively long periods during 
which the observations tend to stay at a high level and similar long periods 
during which observations tend to be at a low level. There may appear to 
be trends or cycles over short time periods, but these do not persist and the 
entire series looks stationary. A more objective criterion is that sample corre- 
lations rj decay to zero at a rate that is approximately proportional to k~> 
for some 0 < A < 1. This is noticeably slower than the rate of decay of rẹ 
for realisations from an AR(1) process, for example, which is approximately 
proportional to A* for some 0 « A « 1. 

The mathematical definition of a stationary process with long-memory, 
also known as long-range dependence or persistence, can be given in terms of 
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its autocorrelation function. A stationary process x, with long-memory has 
an autocorrelation function p; that satisfies the condition 


lim pp = ck ^ 
k—oo 


for some 0 « c and 0 « A « 1. The closer A is to 0, the more pronounced is 
the long-memory. 

'The hydrologist Harold Hurst found that for many geophysical records, 
including the Nile River data, a statistic known as the rescaled range (Exercise 
4) over a period k is approximately proportional to k” for some H > i. 'The 
Hurst parameter, H, is defined by H = 1 — A/2 and hence ranges from i 
to 1. The closer H is to 1, the more persistent the time series. If there is no 
long-memory effect, then H — 1. 

A fractionally differenced ARIMA process {x;}, FARIMA (p, d, q), has the 
form 


é(B)(1 — B)'z, = 4(B)w; (8.1) 


for some -i « d «€ 5. The range 0 < d < i gives long-memory processes. It 
can be useful to introduce the fractionally differenced series (y,) and express 
Equation (8.1) as 


ye = (1 — B)*z, = [6(B)]  9(B)w, (8.2) 


because this suggests a means of fitting a FARIMA model to time series. For 
a trial value of d, we calculate the fractionally differenced series {y+}, fit an 
ARIMA model to {y+}, and then investigate the residuals. The calculation of 
the fractionally differenced series {y+} follows from a formal binomial expan- 
sion of (1 — B)? and is given by 


d(d —1 d(d — 1)(d — 2 
(1— B) 21- dB + o gos de 1 Bry. 
curtailed at some suitably large lag (L), which might reasonably be set to 40. 
For example, if d = 0.45, then 


Yt = Lt — 0.4502, 1 — 0.1237524_2 — 0.0639375a4_3 — --- — 0.00128731274_ 40 


The R code for calculating the coefficients is 


> cf <- rep(0,40) 

> d <- 0.45 

> cf[1] <- -d 

> for (i in 1:39) cf[i*1] <- -cf[i] * (d-i) / (i*1) 


Another equivalent expression for Equation (8.1), which is useful for sim- 
ulations, is 
z, = [6(B)) ^$(B) - B) w: 
In simulations, the first step is to calculate (1— B) -4w;. The operator (1— B) 4 
needs to be expanded as 
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-d(-d-1) > - ded = DEED 
7 3l 


Bex... 


(1— B)-4=1-d(-B)4 


with the series curtailed at some suitably large lag L. The distributions for 
the independent white noise series can be chosen to fit the application, and 
in finance and telecommunications, heavy-tailed distributions are often ap- 
propriate. In particular, a ¢-distribution with v (>4) degrees of freedom has 
kurtosis 6/(v — 4) and so is heavy tailed. If, for example, d = 0.45 and L = 40, 
then 


(1— B)“ w, = we + 0.45w, 1 + 0.32625w, 9 + 0.2664375u4_3 
T4 0.0657056w,. 40 
The autocorrelation function pj of a FARIMA(0, d, 0) process tends towards 


Ld) 


k 24-1 
Tj |k] 


for large n. The process is stationary provided -i «d« i. 'This provides 
a relationship between the differencing parameter d and the long-memory 
parameter A when 0 < d: 
1—A 
2d-1=-A = d= —- 


A FARIMA(0, d, 0) model, with 0 < d < E, lies between a stationary 
AR(1) model and a non-stationary random walk. In practice, for fitting or sim- 
ulation, we have to truncate a FARIMA(0, d, 0) process at some lag L. Then 
it is equivalent to an AR(L) model, but all the coefficients in the FARIMA (0, 


d, 0) model depend on the single parameter d. 


8.3 Fitting to simulated data 


In the following script, the function fracdiff .sim generates a realisation from 
a FARIMA process.! The first parameter is the length of the realisation, and 
then AR and MA parameters can be specified — use c() if there is more 
than one of each, followed by a value for d. The default for the discrete white 
noise (DWN) component is standard Gaussian, but this can be varied by 
using innov or rand.gen, as described in help(fracdiff.sim). We then 
fit à FARIMA model to the realisation. In this case, we set the number of 
AR coefficients to be fitted to 1, but when fitting to a time series from an 
unknown model, we should try several values for the number of autoregressive 
and moving average parameters (nar and nma, respectively). 


! You will need to have the fracdiff library installed. This can be downloaded 
from CRAN. 
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> library (fracdiff) 

> set.seed(1) 

> fds.sim <- fracdiff.sim(10000, ar = 0.9, d = 0.4) 
> x <- fds.sim$series 

> fds.fit <- fracdiff(x, nar = 1) 


In the code below, the first for loop calculates the coefficients for the lagged 
terms in the fractional differences using the fitted value for d. The following 
nested loop then calculates the fractionally differenced time series. Then an 
AR model is fitted to the differenced series and the acf for the residuals is 
plotted (Fig. 8.2). The residuals should appear to be a realisation of DWN. 


n <- length(x) 
L <- 30 
d <- fds.fit$d 
fdc <- d 


fdc[1] <- fdc 
for (k in 2:L) fdc[k] <- fdc[k-1] * (d+1-k) / k 
y <- rep(0, L) 
for (i in (L+1):n) { 
csm <- x[i] 
for (j in 1:L) csm <- csm + ((-1)^j) * fdac[j] * x[i-j] 
y[i] <- csm 
} 
y <- yL(L*1) :n] 
z.ar «- ar(y) 
ns <- 1 + z.ar$order 
z <- z.ar$res [ns:length(y)] 
par(mfcol = c(2, 2)) 
plot(as.ts(x), ylab = "x") 
acf(x) ; acf(y) ; acf(z) 


> 
> 
> 
> 
> 
> 
> 
> 
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In Figure 8.1, we show the results when we generate a realisation {x+} from 
a fractional difference model with no AR or MA parameters, FARIMA(0, 0.4, 
0). The very slow decay in both the acf and pacf indicates long-memory. The 
estimate of d is 0.3921. The fractionally differenced series, {y+}, appears to be 
a realisation of DWN. If, instead of fitting a FARIMA(0, d, 0) model, we use 
ar, the order selected is 38. The residuals from AR(38) also appear to be a 
realisation from DWN, but the single-parameter FARIMA model is far more 
parsimonious. 

In Figure 8.2, we show the results when we generate a realisation {x+} 
from a FARIMA(1, 0.4, 0) model with an AR parameter of 0.9. The estimates 
of d and the AR parameter, obtained from fracdiff, are 0.429 and 0.884, 
respectively. The estimate of the AR. parameter made from the fractionally 
differenced series {y+} using ar is 0.887, and the slight difference is small by 
comparison with the estimated error and is of no practical importance. The 
residuals appear to be a realisation of DWN (Fig. 8.2). 
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Fig. 8.1. A simulated series with long-memory FARIMA(0, 0.4, 0): (a) time series 
plot (x); (b) correlogram of series x; (c) partial correlogram of y; (d) correlogram 
after fractional differencing (z). 


> summary(fds.fit) 


Coefficients: 


Estimate Std. Error z value Pr(>|z]) 


d 0.42904 0 
ar 0.88368 0 
ma 0.00000 0 


> ar(y) 
Coefficients: 
1 


0.887 


Order selected 1 


.01439 29.8 
. 00877 100.7 
.01439 0.0 


<2e-16 *** 
<2e-16 *** 
1 


sigma^2 estimated as 1.03 
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Fig. 8.2. A time series with long-memory FARIMA(1, 0.4, 0): (a) time series plot 
(x); (b) correlogram of series x; (c) correlogram of the differenced series (y); (d) 
correlogram of the residuals after fitting an AR(1) model (z). 


8.4 Assessing evidence of long-term dependence 


8.4.1 Nile minima 


The data in the file Nilemin.txt are annual minimum water levels (mm) 
of the Nile River for the years 622 to 1284, measured at the Roda Island 
gauge near Cairo. It is likely that there may be a trend over a 600-year period 
due to changing climatic conditions or changes to the channels around Roda 
Island. We start the analysis by estimating and removing a linear trend fitted 
by regression. Having done this, a choice of nar is taken as a starting value 
for using fracdiff on the residuals from the regression. Given the iterative 
nature of the fitting process, the choice of initial values for nar and nma should 
not be critical. The estimate of d with nar set at 5 is 0.3457. The best-fitting 
model to the fractionally differenced series is AR(1) with parameter 0.14. We 
now re-estimate d using fracdiff with nar equal to 1, but in this case the 
estimate of d is unchanged. The residuals are a plausible realisation of DWN. 
The acf of the squared residuals indicates that a GARCH model would be 
appropriate. T'here is convincing evidence of long-term memory in the Nile 
River minima flows (Fig. 8.3). 


8.4 Assessing evidence of long-term dependence 165 


Nile minima Fractionally differenced series 
= E eo 7] 
t g 5 
E 87 y Ts 
MEE o + 
£ i X o" 
rJ e - 
"81 3 puro reli 
T T T T T T T T T T T T T 
0 100 200 300 400 500 600 0 5 10 15 20 25 
Time Lag 
Detrended Nile minima Residuals 
2 | 2] 
eo eo 
W = WL -l 
O hal o + 
X o] |l € oT 
sl LLL EL e ed nce g Merere ates des 
E OAE or ns etl ai "petere I a 
0 5 10 15 20 25 0 5 10 15 20 25 
Lag Lag 
Fractionally differenced series Squared residuals 
7 2 | 
ex [e] 
E o4 i E 
E TJ Qc e 
s] o HH4-4----------7----- 
N O Ll---- LE AE Mee n mpg um qq 
l T T T T T T T T T T T T 
O 100 200 300 400 500 0 5 10 15 20 25 
Time Lag 


Fig. 8.3. Nile River minimum water levels: time series (top left); acf of detrended 
time series (middle left); fractionally differenced detrended series (lower left); acf of 
fractionally differenced series (top right); acf of residuals of AR(1) fitted to frac- 
tionally differenced series (middle right); acf of squared residuals of AR(1) (lower 
right). 


8.4.2 Bellcore Ethernet data 


The data in LAN.txt are the numbers of packet arrivals (bits) in 4000 consecu- 
tive 10-ms intervals seen on an Ethernet at the Bellcore Morristown Research 
and Engineering facility. A histogram of the numbers of bits is remarkably 
skewed, so we work with the logarithm of one plus the number of bits. The 
addition of 1 is needed because there are many intervals in which no pack- 
ets arrive. The correlogram of this transformed time series suggests that a 
FARIMA model may be suitable. 

The estimate of d, with nar set at 48, is 0.3405, and the fractionally dif- 
ferenced series has no substantial correlations. Nevertheless, the function ar 
fits an AR(26) model to this series, and the estimate of the standard devi- 
ation of the errors, 2.10, is slightly less than the standard deviation of the 
fractionally differenced series, 2.13. There is noticeable autocorrelation in the 
series of squared residuals from the AR(26) model, which is a feature of time 
series that have bursts of activity, and this can be modelled as a GARCH 
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Fig. 8.4. Bellcore local area network (LAN) traffic, In(1--number of bits): time 
series (top left); acf of time series (middle left); fractionally differenced series (lower 
left); acf of fractionally differenced series (top right); acf of residuals of AR(26) fitted 
to fractionally differenced series (middle right); acf of squared residuals of AR(26) 
(lower right). 


process (Fig. 8.4). In Exercises 1 and 2, you are asked to look at this case in 
more detail and, in particular, investigate whether an ARMA model is more 
parsimonious. 


8.4.3 Bank loan rate 


The data in mprime.txt are of the monthly percentage US Federal Reserve 
Bank prime loan rate,? courtesy of the Board of Governors of the Federal 
Reserve System, from January 1949 until November 2007. The time series is 
plotted in the top left of Figure 8.5 and looks as though it could be a realisation 
of a random walk. It also has a period of high variability. The correlogram 
shows very high correlations at smaller lags and substantial correlation up to 
lag 28. Neither a random walk nor a trend is a suitable model for long-term 


? Data downloaded from Federal Reserve Economic Data at the Federal Reserve 
Bank of St. Louis. 
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simulation of interest rates in a stable economy. Instead, we fit a FARIMA 
model, which has the advantage of being stationary. 
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Fig. 8.5. Federal Reserve Bank interest rates: time series (top left); acf of time series 
(middle left); fractionally differenced series (lower left); acf of fractionally differenced 
series (upper right); acf of residuals of AR(17) fitted to fractionally differenced series 
(middle right); acf of squared residuals of AR(17) (lower right). 


The estimate of d is almost 0, and this implies that the decay of the 
correlations from an initial high value is more rapid than it would be for a 
FARIMA model. The fitted AR model has an order of 17 and is not entirely 
satisfactory because of the statistically significant autocorrelation at lag 1 in 
the residual series. You are asked to do better in Exercise 3. The substantial 
autocorrelations of the squared residuals from the AR(17) model indicate that 
a GARCH model is needed. This has been a common feature of all three time 
series considered in this section. 


8.5 Simulation 


FARIMA models are important for simulation because short-memory models, 
which ignore evidence of long-memory, can lead to serious overestimation of 
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system performance. This has been demonstrated convincingly at scales from 
reservoirs to routers in telecommunication networks. 

Realistic models for simulation will typically need to incorporate GARCH 
and heavy-tailed distributions for the basic white noise series. The procedure 
is to fit à GARCH model to the residuals from the AR. model fitted to the 
fractionally differenced series. Then the residuals from the GARCH model 
are calculated and a suitable probability distribution can be fitted to these 
residuals (Exercise 5). Having fitted the models, the simulation proceeds by 
generating random numbers from the fitted probability model fitted to the 
GARCH residuals. 


8.6 Summary of additional commands used 


fracdiff fits a fractionally differenced, FARIMA(p, d, q), model 
fracdiff.sim simulates a FARIMA model 


8.7 Exercises 


1. Read the LAN data into R. 

a) Plot a boxplot and histogram of the number of bits. 

b) Calculate the skewness and kurtosis of the number of bits. 

c) Repeat (a) and (b) for the logarithm of 1 plus the number of bits. 

d) Repeat (a) for the residuals after fitting an AR. model to the fraction- 
ally differenced series. 
Fit an ARMA(p, q) model to the fractionally differenced series. Is this 
an improvement on the AR(p) model? 
In the text, we set nar in fracdiff at 48. Repeat the analysis with 
nar equal to 2. 


oO 
— 


lm) 
WwW 


2. Read the LAN data into R. 
a) Calculate the number of bits in 20-ms intervals, and repeat the analysis 
using this time series. 
b) Calculate the number of bits in 40-ms intervals, and repeat the analysis 
using this time series. 
c) Repeat (a) and (b) for realisations from FARIMA(0, d, 0). 


3. Read the Federal Reserve Bank data into R. 
a) Fit a random walk model and comment. 
b) Fit an ARMA(p, q) model and comment. 
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4. The rescaled adjusted range is calculated for a time series {x,} of length 
m as follows. First compute the mean, Z, and standard deviation, s, of 
the series. Then calculate the adjusted partial sums 


k 
Sk = Soe — kz 
t=1 


for k =1,...,m. Notice that S(m) must equal zero and that large devia- 
tions from 0 are indicative of persistence. The rescaled adjusted range 


Rm = {max(S1,..., Sm) —min(S1,...,5m)}/s 


is the difference between the largest surplus and the greatest deficit. If we 
have a long time series of length n, we can calculate R,, for values of m 
from, for example, 20 upwards to n in steps of 10. When m is less than 
n, we can calculate n — m values for Rm by starting at different points in 
the series. Hurst plotted In(R,,) against In(m) for many long time series. 
He noticed that lines fitted through the points were usually steeper for 
geophysical series, such as streamflow, than for realisations of independent 
Gaussian variables (Gaussian DWN). The average value of the slope (H) 
of these lines for the geophysical time series was 0.73, significantly higher 
than the average slope of 0.5 for the independent sequences. The linear 
logarithmic relationship is equivalent to 


Rm x m” 


Plot In(E,,) against In(m) for the detrended Nile River minimum flows. 


5. a) Refer to the data in LAN. txt and the time series of logarithms of the 
numbers of packet arrivals, with 1 added, in 10-ms intervals calcu- 
lated from the numbers of packet arrivals. Fit à GARCH model to the 
residuals from the AR(26) model fitted to the fractionally differenced 
time series. 

b) Calculate the residuals from the GARCH model, and fit a suitable 
distribution to these residuals. 

c) Calculate the mean number of packets arriving in 10-ms intervals. Set 
up a simulation model for a router that has a realisation of the model 
in (a) as input and can send out packets at a constant rate equal to 
the product of the mean number of packets arriving in 10-ms intervals 
with a factor g, which is greater than 1. 

d) Code the model fitted in (a) so that it will provide simulations of 
time series of the number of packets that are the input to the router. 
Remember that you first obtain a realisation for In(number of packets 
+ 1) and then take the exponential of this quantity, subtract 1, and 
round the result to the nearest integer. 
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e) Compare the results of your simulation with a model that assumes 
Gaussian white noise for the residuals of the AR(26) model for g = 
1.05, 1.1, 1.5, and 2. 
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Spectral Analysis 


9.1 Purpose 


Although it follows from the definition of stationarity that a stationary time 
series model cannot have components at specific frequencies, it can never- 
theless be described in terms of an average frequency composition. Spectral 
analysis distributes the variance of a time series over frequency, and there are 
many applications. It can be used to characterise wind and wave forces, which 
appear random but have a frequency range over which most of the power is 
concentrated. The British Standard BS6841, “Measurement and evaluation of 
human exposure to whole-body vibration”, uses spectral analysis to quantify 
exposure of personnel to vibration and repeated shocks. Many of the early 
applications of spectral analysis were of economic time series, and there has 
been recent interest in using spectral methods for economic dynamics analysis 
(Iacobucci and Noullez, 2005). 

More generally, spectral analysis can be used to detect periodic signals 
that are corrupted by noise. For example, spectral analysis of vibration signals 
from machinery such as turbines and gearboxes is used to expose faults before 
they cause catastrophic failure. The warning is given by the emergence of new 
peaks in the spectrum. Astronomers use spectral analysis to measure the red 
shift and hence deduce the speeds of galaxies relative to our own. 


9.2 Periodic signals 


9.2.1 Sine waves 


Any signal that has a repeating pattern is periodic, with a period equal to 
the length of the pattern. However, the fundamental periodic signal in mathe- 
matics is the sine wave. Joseph Fourier (1768-1830) showed that sums of sine 
waves can provide good approximations to most periodic signals, and spectral 
analysis is based on sine waves. 


P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 171 
Use R, DOI 10.1007 /978-0-387-88698-5_9, 
© Springer Science+Business Media, LLC 2009 


172 9 Spectral Analysis 


Spectral analysis can be confusing because different authors use different 
notation. For example, frequency can be given in radians or cycles per sam- 
pling interval, and frequency can be treated as positive or negative, or just 
positive. You need to be familiar with the sine wave defined with respect to 
a unit circle, and this relationship is so fundamental that the sine and cosine 
functions are called circular functions. 

Imagine a circle with unit radius and centre at the origin, O, with the 
radius rotating at a rotational velocity of w radians per unit of time. Let £ 
be time. The angle, wt, in radians is measured as the distance around the 
circumference from the positive real (horizontal) axis, with the anti-clockwise 
rotation defined as positive (Fig. 9.1). So, if the radius sweeps out a full circle, 
it has been rotated through an angle of 27 radians. The time taken for this 
one revolution, or cycle, is 27/w and is known as the period. 

The sine function, sin(wt), is the projection of the radius onto the vertical 
axis, and the cosine function, cos(wt), is the projection of the radius onto the 
horizontal axis. In general, a sine wave of frequency w, amplitude A, and phase 
w is 

Asin(wt + Y) (9.1) 
The positive phase shift represents an advance of w/27 cycles. In spectral 
analysis, it is convenient to refer to specific sine waves as harmonics. We rely 
on the trigonometric identity that expresses a general sine wave as a weighted 
sum of sine and cosine functions: 


Asin(wt + Y) = Acos(v)sin(wt) + Asin(v)cos(wt) (9.2) 


Equation (9.2) is fundamental for spectral analysis because a sampled sine 
wave of any given amplitude and phase can be fitted by a linear regression 
model with the sine and cosine functions as predictor variables. 


9.2.2 Unit of measurement of frequency 


The SI! unit of frequency is the hertz (Hz), which is 1 cycle per second and 
equivalent to 27 radians per second. The hertz is a derived SI unit, and in 
terms of fundamental SI units it has unit s~!. A frequency of f cycles per 
second is equivalent to w radians per second, where 

Mf e [fc (9.3) 

w = 2m = — : 
2T 

'The mathematics is naturally expressed in radians, but Hz is generally used 
in physical applications. By default, R plots have a frequency axis calibrated 
in cycles per sampling interval. 


! SI is the International System of Units, abbreviated from the French Le Systéme 
International d’Unités. 
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Fig. 9.1. Angle wt is the length along the radius. The projection of the radius onto 
the x and y axes is cos(wt) and sin(wt), respectively. 


9.3 Spectrum 
9.3.1 Fitting sine waves 


Suppose we have a time series of length n, (x, : t = 1,...,n}, where it is 
convenient to arrange that n is even, if necessary by dropping the first or last 
term. We can fit a time series regression with x; as the response and n — 1 
predictor variables: 

cos (22), sin (2%), cos (42%), sin (42), cos ($25), sin ($t) ,.. . 


cos (388-0), sin (4eB-uet), cos (mt). 


We will denote the estimated coefficients by a1, b1, a2, b2, a3, 03, ..., a4 21, 
bn/2-15 0/2, respectively, so 


2mt ; 2mt 
xı = ao + acos | — | + biisin | — ] +- 
n n 


2(n/2 — w) T (oe — 1)nt 


T 
TL TL 


faac ( ) + à4,/2C08 (nt) 
Since the number of coefficients equals the length of the time series, there are 
no degrees of freedom for error. The intercept term, ao, is just the mean z. The 
lowest frequency is one cycle, or 27 radians, per record length, which is 27 /n 
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radians per sampling interval. A general frequency, in this representation, is m 
cycles per record length, equivalent to 27m/n radians per sampling interval, 
where m is an integer between 1 and n/2. The highest frequency is 7 radians 
per sampling interval, or equivalently 0.5 cycles per sampling interval, and it 
makes n/2 cycles in the record length, alternating between —1 and +1 at the 
sampling points. This regression model is a finite Fourier series for a discrete 
time series.? 

We will refer to the sine wave that makes m cycles in the record length 
as the mth harmonic, and the first harmonic is commonly referred to as the 
fundamental frequency. The amplitude of the mth harmonic is 


Am = a2, + 02, 


Parseval’s Theorem is the key result, and it expresses the variance of the time 
series as a sum of n/2 components at integer frequencies from 1 to n/2 cycles 
per record length: 


oe t= AGH Fe eer | AZ + A2 79 (9.4) 
Var(z) = 3556/2 A2, + A24 


Parseval's Theorem follows from the fact that the sine and cosine terms used 
as explanatory terms in the time series regression are uncorrelated, together 
with the result for the variance of a linear combination of variables (Exer- 
cise 1). A summary of the harmonics, and their corresponding frequencies 
and periods,’ follows: 


harmonic period frequency frequency contribution 
(cycle/samp. int.) (rad/samp. int.) to variance 


n 1/n 2m /n iA 
n/2 2/n An /n iA2 
n/3 3/n 6r /n 243 
n/2—1 n/(n/2—1) | (n/2—1)/n (n — 2)n/n LAP a à 
n/2 2 1/n T A2 o 


Although we have introduced the Am in the context of a time series regres- 
sion, the calculations are usually performed with the fast fourier transform 
algorithm (FFT). We say more about this in 89.7. 


? A Fourier series is an approximation to a signal defined for continuous time over 
a finite period. The signal may have discontinuities. The Fourier series is the sum 
of an infinite number of sine and cosine terms. 

3 The period of a sine wave is the time taken for 1 cycle and is the reciprocal of 
the frequency measured in cycles per time unit. 
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9.3.2 Sample spectrum 


A plot of A2,, as spikes, against m is a Fourier line spectrum. The raw pe- 
riodogram in R is obtained by joining the tips of the spikes in the Fourier 
line spectrum to give a continuous plot and scaling it so that the area equals 
the variance. The periodogram distributes the variance over frequency, but it 
has two drawbacks. The first is that the precise set of frequencies is arbitrary 
inasmuch as it depends on the record length. The second is that the peri- 
odogram does not become smoother as the length of the time series increases 
but just includes more spikes packed closer together. The remedy is to smooth 
the periodogram by taking a moving average of spikes before joining the tips. 
The smoothed periodogram is also known as the (sample) spectrum. We de- 
note the spectrum of {x+} by Csa(), with an argument w or f depending on 
whether it is expressed in radians or cycles per sampling interval. However, 
the smoothing will reduce the heights of peaks, and excessive smoothing will 
blur the features we are looking for. It is a good idea to consider spectra 
with different amounts of smoothing, and this is made easy for us with the R 
function spectrum. The argument span is the number of spikes in the moving 
average,* and is a useful guide for an initial value, for time series of lengths 
up to a thousand, is twice the record length. 

The time series should either be mean adjusted (mean subtracted) before 
calculating the periodogram or the ap spike should be set to 0 before averaging 
spikes to avoid increasing the low-frequency contributions to the variance. In 
R, the spectrum function goes further than this and removes a linear trend 
from the series before calculating the periodogram. It seems appropriate to fit 
a trend and remove it if the existence of a trend in the underlying stochastic 
process is plausible. Although this will usually pertain, there may be cases in 
which you wish to attribute an apparent trend in a time series to a fractionally 
differenced process, and prefer not to remove a fitted trend. You could then use 
the fft function and average the spikes to obtain a spectrum of the unadjusted 
time series (§9.7). 

The spectrum does not retain the phase information, though in the case 
of stationary time series all phases are equally likely and the sample phases 
have no theoretical interest. 


9.4 Spectra of simulated series 
9.4.1 White noise 


We will start by generating an independent random sample from a normal 
distribution. This is a realisation of a Gaussian white noise process. If no span 
is specified in the spectrum function, R will use the heights of the Fourier line 


4 Weighted moving averages can be used, and the choice of weights determines the 
spectral window. 
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spectrum spikes to construct a spectrum with no smoothing.” We compare 
this with a span of 65 in Figure 9.2. 


layout (1:2) 

set.seed(1) 

x <- rnorm(2048) 

Spectrum(x, log = c("no")) 
spectrum(x, span = 65, log = c("no")) 
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Fig. 9.2. Realisation of Gaussian white noise: (a) raw periodogram; (b) spectrum 
with span = 65. 


The default is a logarithmic scale for the spectrum, but we have changed 
this by setting the log parameter to "no". The frequency axis is cycles per 
sampling interval. 

The second spectrum is much smoother as a result of the moving average 
of 65 adjacent spikes. Both spectra are scaled so that their area is one-half 
the variance of the time series. The rationale for this is that the spectrum is 


5 By default, spectrum applies a taper to the first 10% and last 10% of the series and 
pads the series to a highly composite length. However, 2048 is highly composite, 
and the taper has little effect on a realisation of this length. 
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defined from —0.5 to 0.5, and is symmetric about 0. However, in the context of 
spectral analysis, there is no useful distinction between positive and negative 
frequencies, and it is usual to plot the spectrum over [0, 0.5], scaled so that its 
area equals the variance of the signal. So, for a report it is better to multiply 
the R spectrum by a factor of 2 and to use hertz rather than cycles per sampling 
interval for frequency. You can easily do this with the following R commands, 
assuming the width of the sampling interval is Del (which would need to be 
assigned first): 

> x.spec <- spectrum (x, span = 65, log = c("no")) 

> spx <- x.spec$freq / Del 

> spy <- 2 * x.spec$spec 

> plot (spx, spy, xlab = "Hz", ylab = "variance/Hz", type = "1") 


The theoretical spectrum for independent random variation with variance 
of unity is flat at 2 over the range [0,0.5]. The name white noise is chosen 
to be reminiscent of white light made up from equal contributions of energy 
across the visible spectrum. An explanation for the flat spectrum arises from 
the regression model. If we have independent random errors, the E|[am] and 
Eļ|bm] will all be 0 and the E[A?,] are all equal. Notice that the vertical scale 
for the smoothed periodogram is from 0.8 to 1.4, so it is relatively flat (Fig. 
9.2). If longer realisations are generated and the bandwidth is held constant, 
the default R spectra will tend towards a flat line at a height of 1. 

The bandwidths shown in Figure 9.2 are calculated from the R definition 
of bandwidth as spanx {0.5/(n/2)}/V/12. A more common definition of band- 
width in the context of spectral analysis is span/(n/2) cycles per sampling 
interval. The latter definition is the spacing between statistically independent 
estimates of the spectrum height, and it is larger than the R bandwidth by a 
factor of 6.92. 

'The spectrum distributes variance over frequency, and the expected shape 
does not depend on the distribution that is being sampled. You are asked to 
investigate the effect, if any, of using random numbers from an exponential, 
rather than normal, distribution in Exercise 2. 


9.4.2 AR(1): Positive coefficient 


We generate a realisation of length 1024 from an AR(1) process with a equal 
to 0.9 and compare the time series plot, correlogram, and spectrum in Figure 
9.3. 


set.seed(1) 

x <- w <- rnorm(1024) 

for (t in 2:1024) x[t]<- 0.9 * x[t-1] + wit] 
layout (1:3) 

plot (as.ts(x)) 

acf (x) 

spectrum(x, span = 51, log = c("no")) 
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Fig. 9.3. Simulated AR(1) process with o — 0.9: (a) time plot; (b) correlogram; (c) 
Spectrum. 


'The plot of the time series shows the tendency for consecutive values to 
be relatively similar, and change is relatively slow, so we might expect the 
spectrum to pick up low-frequency variation. The acf quantifies the tendency 
for consecutive values to be relatively similar. The spectrum confirms that 
low-frequency variation dominates. 


9.4.3 AR(1): Negative coefficient 


We now change o from 0.9 to —0.9. The plot of the time series (Fig. 9.4) 
shows the tendency for consecutive values to oscillate, change is rapid, and we 
expect the spectrum to pick up high-frequency variation. The acf quantifies 
the tendency for consecutive values to oscillate, and the spectrum shows high 
frequency variation. 


9.4.4 AR(2) 


Consider an AR(2) process with parameters 1 and —0.6. This can be inter- 
preted as a second-order difference equation describing the motion of a lightly 
damped single mode system (Exercise 3), such as a mass on a spring, subjected 
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Fig. 9.4. Simulated AR(1) process with a = —0.9: (a) time plot; (b) correlogram; 
(c) spectrum. 


to a sequence of white noise impulses. The spectrum in Figure 9.5 shows a 
peak at the natural frequency of the system — the frequency at which the mass 
will oscillate if the spring is extended and then released. 


set.seed(1) 

x <- w <- rnorm(1024) 

for (t in 3:1024) x[t] <- x[t-1] - 0.6 * x[t-2] + w[t] 
layout (1:3) 

plot (as.ts(x)) 

acf (x) 

spectrum (x, span = 51, log = c("no")) 
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9.5 Sampling interval and record length 


Many time series are of an inherently continuous variable that is sampled to 
give a time series at discrete time steps. For example, the National Climatic 
Data Center (NCDC) provides 1-minute readings of temperature, wind speed, 
and pressure at meteorological stations throughout the United States. It is 
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Fig. 9.5. Simulated AR(2) process with a; = 1 and az = —0.6: (a) time plot; (b) 
correlogram; (c) spectrum. 


crucial that the continuous signal be sampled at a sufficiently high rate to 
retain all its information. If the sampling rate is too low, we not only lose 
information but will mistake high-frequency variation for variation at a lower 
frequency. This latter phenomenon is known as aliasing and can have serious 
consequences. 

In signal processing applications, the measurement device may return a 
voltage as a continuously varying electrical signal. However, analysis is usu- 
ally performed on a digital computer, and the signal has to be sampled to give 
a time series at discrete time steps. The sampling is known as analog-to-digital 
conversion (A/D). Modern oscilloscopes sample at rates as high as Giga sam- 
ples per second (GS/s) and have anti-alias filters, built from electronic com- 
ponents, that remove any higher-frequency components in the original contin- 
uous signal. Digital recordings of musical performances are typically sampled 
at rates of 1 Mega sample per second (MS/s) after any higher-frequencies 
have been removed with anti-alias filters. Since the frequency range of human 
hearing is from about 15 to 20,000 Hz, sampling rates of 1 MS/s are quite 
adequate for high-fidelity recordings. 
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9.5.1 Nyquist frequency 


The Nyquist frequency is the cutoff frequency associated with a given sam- 
pling rate and is one-half the sampling frequency. Once a continuous signal 
is sampled, any frequency higher than the Nyquist frequency will be indistin- 
guishable from its low-frequency alias. 

To understand this phenomenon, suppose the sampling interval is A and 
the corresponding sampling frequency is 1/A samples per second. A sine wave 
with a frequency of 1/A cycles per second is generated by the radius in Figure 
9.1 rotating anti-clockwise at a rate of 1 revolution per sampling interval A, 
and it follows that it cannot be detected when sampled at this rate. Similarly, a 
sine wave with a frequency of —1/A cycles per second, generated by the radius 
in Figure 9.1 rotating clockwise at a rate of 1 revolution per sampling interval 
A, is also undetectable. Now consider a sine wave with a frequency f that lies 
within the interval [—1/(2A), 1/(2A)]. This sine wave will be indistinguishable 
from any sine wave generated by a radius that completes an integer number 
of additional revolutions, anti-clockwise or clockwise, during the sampling 
interval. More formally, the frequency f will be indistinguishable from 


fika (9.5) 


where k is an integer. Figure 9.6 shows a sine function with a frequency of 1 Hz, 
sin(27t), sampled at 0.2 s, together with its alias when k in Equation (9.5) 
equals —1. This alias frequency is 1 — 1/0.2, which equals —4 Hz. Physically, 
a frequency of —4 Hz is identical to a frequency of 4 Hz, except for a phase 
difference of half a cycle (sin(—0) = —sin(0) = sin(0 — 7)). 


t <- (0:10) / 5 

tc <- (0:2000) / 1000 

x <- sin (2 * pi * t) 

sin (2 * pi * tc) 

xa <- sin (-4 * 2 * pi * tc) 
plot (t, x) 

lines (tc, xc) 

lines (tc, xa, lty = "dashed") 
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'To summarise, the Nyquist frequency Q is related to the sampling interval 
A by 


Q- 54 (9.6) 


and Q should be higher than any frequency components in the continuous 
signal. 


9.5.2 Record length 


To begin with, we need to establish the highest frequency we can expect to 
encounter and set the Nyquist frequency Q well above this. The Nyquist fre- 
quency determines the sampling interval, A, from Equation (9.6). If the time 
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Fig. 9.6. Aliased frequencies: 1 Hz and 4 Hz with A — 0.2 second. 


series has length n, the record length, T', is nA. The fundamental frequency 
is 1/T Hz, and this is the spacing between spikes in the Fourier line spec- 
trum. If we wish to distinguish frequencies separated by e Hz, we should aim 
for independent estimates of the spectrum centred on these frequencies. This 
implies that the bandwidth must be at most e. If we take a moving average 
of L spikes in the Fourier line spectrum, we have the following relationship: 


2L 2L 
nA = T Se (9.7) 
For example, suppose we wish to distinguish frequencies separated by 1 Hz 
in an audio recording. A typical sampling rate for audio recording is 1 MS/s, 
corresponding to A = 0.000001. If we take L equal to 100, it follows from 
Equation (9.7) that n must exceed 200 x 10°. This is a long time series but the 
record length is less than four minutes. If a time series of this length presents 
computational problems, an alternative method for computing a smoothed 
spectrum is to calculate the Fourier line spectrum for the 100 subseries of two 
million observations and average these 100 Fourier line spectra. 
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9.6 Applications 


9.6.1 Wave tank data 


The data in the file wave.dat are the surface height, relative to still water 
level, of water at the centre of a wave tank sampled over 39.6 seconds at a 
rate of 10 samples per second. The aim of the analysis is to check whether the 
spectrum is a realistic emulation of typical sea spectra. Referring to Figure 
9.7, the time series plot gives a general impression of the wave profile over time 
and we can see that there are no obvious erroneous values. The correlogram 
is qualitatively similar to that for a realisation of an AR(2) process, but 
an AR(2) model would not account for a second peak in the spectrum at a 
frequency near 0.09. 


www <- "http://www.massey.ac.nz/^pscowper/ts/wave.dat" 
wavetank.dat <- read.table(www, header-T) 

attach (wavetank.dat) 

layout (1:3) 

plot (as.ts(waveht)) 

acf (waveht) 

Spectrum (waveht) 
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'The default method of fitting the spectrum used above does not require the 
ar function. However, the ar function is used in $9.9 and selects an AR(13) 
model. The shape of the estimated spectrum in Figure 9.7 is similar to that 
of typical sea spectra. 


9.6.2 Fault detection on electric motors 


Induction motors are widely used in industry, and although they are generally 
reliable, they do require maintenance. À common fault is broken rotor bars, 
which reduce the output torque capability and increase vibration, and if left 
undetected can lead to catastrophic failure of the electric motor. The measured 
current spectrum of a typical motor in good condition will have a spike at 
mains frequency, commonly 50 Hz, with side band peaks at 46 Hz and 54 Hz. 
If a rotor bar breaks, the magnitude of the side band peaks will increase by a 
factor of around 10. This increase can easily be detected in the spectrum. 

Siau et al. (2004) compare current spectra for an induction motor in good 
condition and with one broken bar. They sample the current at 0.0025-second 
intervals, corresponding to a Nyquist frequency of 200 Hz, and calculate spec- 
tra from records of 100 seconds length. The time series have length 40,000, 
and the bandwidth with a span of 60 is 1.2 Hz (Equation (9.7)). 

The data are in the file imotor.txt. R code for drawing the spectra (Fig. 
9.8) follows. The broken bar condition is indicated clearly by the higher side 
band peaks in the spectrum. In contrast, the standard deviations of the good 
condition and broken condition time series are very close. 


$ The pacf, not shown here, also suggests that an AR(2) model would be plausible. 
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Fig. 9.7. Wave elevation series: (a) time plot; (b) correlogram; (c) spectrum. 


www <- "http://www.massey.ac.nz/~pscowper/ts/imotor.txt" 
imotor.dat <- read.table(www, header = T) 
attach (imotor.dat) 
xg.spec <- spectrum(good, span = 9) 
xb.spec <- spectrum(broken, span = 9) 
freqg <- 400 * xg.spec$freq [4400:5600] 
freqb <- 400 * xb.spec$freq [4400:5600] 
plot(freqg, 10*log10(xg.spec$spec[4400:5600]), main = "", 
xlab = "Frequency (Hz)", ylab = "Current spectrum (dB)", type="1") 
lines(freqb, 10 * logi0(xb.spec$spec[4400:5600]), lty = "dashed") 
sd (good) 


[1] 7071.166 


> 


sd(broken) 


[1] 7071.191 


9.6.3 Measurement of vibration dose 


The drivers of excavators in open cast mines are exposed to considerable me- 
chanical vibration. The British Standard Guide BS6841:1987 is routinely used 


to 


quantify the effects. A small engineering company has developed an active 
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Fig. 9.8. Spectrum of current signal from induction motor in good condition (solid) 
and with broken rotor bar (dotted). Frequency is in cycles per 0.0025 second sam- 
pling interval. 


vibration absorber for excavators and has carried out tests. The company has 
accelerometer measurements of the acceleration in the forward (x), sideways 
(y), and vertical (z) directions during a rock-cutting operation. The estimated 
vibration dose value is defined as 


eVDV = [(1.4x a x T] ^ (9.8) 
where @ is the root mean square value of frequency-weighted acceleration 


(ms?) and T is the duration (s). The mean square frequency-weighted accel- 
eration in the vertical direction is estimated by 


a? = J Cas (f)W(f) df (9.9) 


where the weighting function, W (f), represents the relative severity of vibra- 
tion at different frequencies for a driver, and the acceleration time series is the 
second derivative of the displacement signal, denoted Z. Components in the 
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forward and sideways directions are defined similarly, and then a is calculated 
as 

à — (a2 +a +a2)'/ (9.10) 
The data in the file zàd. txt are acceleration in the vertical direction (mm 
s^?) measured over a 5-second period during a rock-cutting operation. The 
sampling rate is 200 per second, and analog anti-aliasing filters were used to 
remove any frequencies above 100 Hz in the continuous voltage signal from the 
accelerometer. The frequency-weighting function was supplied by a medical 
consultant. It is evaluated at 500 frequencies to match the spacing of the 
spectrum ordinates and is given in vibdoswt.txt. The R routine has been 
written to give diagrams in physical units, as required for a report." 


www <- "http://www.massey.ac.nz/^pscowper/ts/zdd.txt" 
zdotdot.dat «- read.table(www, header - T) 
attach (zdotdot.dat) 
www <- "http://www.massey.ac.nz/^pscowper/ts/vibdoswt.txt" 
wt.dat <- read.table (www, header = T) 
attach (wt.dat) 
acceln.spec <- spectrum (Accelnz, span = sqrt(2 * length(Accelnz))) 
Frequ «- 200 * acceln.spec$freq 
Sord <- 2 * acceln.spec$spec / 200 
Time «- (1:1000) / 200 
layout (1:3) 
plot (Time, Accelnz, xlab - "Time (s)", 
ylab = expression(mm^ s^-2), 
main = "Acceleration", type = "1") 
> plot (Frequ, Sord, main = "Spectrum", xlab = "Frequency (Hz)", 
ylab = expression(mm^2^s^-4^Hz^-1), type = "1") 
> plot (Frequ, Weight, xlab = "Frequency (Hz)", 
main = "Weighting function", type = "1") 
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> sd (Accelnz) 
[1] 234.487 

> sqrt( sum(Sord * Weight) * 0.2 ) 
[1] 179.9286 


Suppose a driver is cutting rock for a 7-hour shift. The estimated root 
mean square value of frequency weighted acceleration is 179.9 (mm s~?). If 
we assume continuous exposure throughout the 7-hour period, the eVDV cal- 
culated using Equation (9.8) is 3.17 (m s^ -7?). The British Standard states 
that doses as high as 15 will cause severe discomfort but is non-committal 
about safe doses arising from daily exposure. The company needs to record 
acceleration measurements during rock-cutting operations on different occa- 
sions, with and without the vibration absorber activated. It can then estimate 
the decrease in vibration dose that can be achieved by fitting the vibration 
absorber to an excavator (Fig. 9.9). 


T Within R, type demo(plotmath) to see a list of mathematical operators that can 
be used by the function expression for plots. 
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Fig. 9.9. Excavator series: (a) acceleration in vertical direction; (b) spectrum; (c) 
frequency weighting function. 


9.6.4 Climatic indices 


Climatic indices are strongly related to ocean currents, which have a major 
influence on weather patterns throughout the world. For example, El Nino is 
associated with droughts throughout much of eastern Australia. A statistical 
analysis of these indices is essential for two reasons. Firstly, it helps us assess 
evidence of climate change. Secondly, it allows us to forecast, albeit with 
limited confidence, potential natural disasters such as droughts and to take 
action to mitigate the effects. Farmers, in particular, will modify their plans 
for crop planting if drought is more likely than usual. Spectral analysis enables 
us to identify any tendencies towards periodicities or towards persistence in 
these indices. 

The Southern Oscillation Index (SOT) is defined as the normalised pressure 
difference between Tahiti and Darwin. El Nino events occur when the SOI is 
strongly negative, and are associated with droughts in eastern Australia. The 
monthly time series® from January 1866 until December 2006 are in soi.txt. 
The time series plot in Figure 9.10 is a useful check that the data have been 
read correctly and gives a general impression of the range and variability of 
the SOI. But, it is hard to discern any frequency information. The spectrum 
is plotted with a logarithmic vertical scale and includes a 95% confidence in- 
terval for the population spectrum in the upper right. The confidence interval 
can be represented as a vertical line relative to the position of the sample 


8 More details and the data are at http:/ /www.cru.uea.ac.uk/cru/data/soi.htm. 
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spectrum indicated by the horizontal line, because it has a constant width on 
a logarithmic scale (§9.10.2). The spectrum has a peak at a low-frequency, so 
we enlarge the low frequency section of the spectrum to identify this frequency 
more precisely. It is about 0.022 cycles per month and corresponds to a period 
of 45 months. However, the peak is small and lower frequency contributions 
to the spectrum are substantial, so we cannot expect a regular pattern of El 
Nino events. 


www <- "http://www.massey.ac.nz/~pscowper/ts/soi.txt" 

soi.dat <- read.table(www, header = T) 

attach (soi.dat) 

soi.ts <- ts(SOI, st = c(1866, 1), end = c(2006, 11), fr = 12) 
layout (1:3) 

plot (soi.ts) 

Soi.spec <- spectrum( SOI, span = sqrt(2 * length(SOI)) ) 

plot (soi.spec$freq[1:60], soi.spec$spec[1:60], type = "1") 
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Fig. 9.10. Southern Oscillation Index: (a) time plot; (b) spectrum; (c) spectrum 
for the low-frequencies. 


The Pacific Decadal Oscillation (PDO) index is the difference between an 
average of sea surface temperature anomalies in the North Pacific Ocean pole- 
ward of 20°N and the monthly mean global average anomaly.? The monthly 
time series from January 1900 until November 2007 is in pdo.txt. The spec- 
trum in Figure 9.11 has no noteworthy peak and increases as the frequency 


? The time series data are available from http://jisao.washington.edu/pdo/. 
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becomes lower. The function spectrum removes a fitted linear trend before 
calculating the spectrum, so the increase as the frequency tends to zero is 
evidence of long-term memory in the PDO. 


PDO 
0 


-3 


spectrum 
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Fig. 9.11. 


layout (1:2) 
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plot (pdo.ts) 
spectrum( PDO, span = sqrt( 2 * length(PDO) ) ) 
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Pacific Decadal Oscillation: (a) time plot; (b) spectrum. 


www <- "http://www.massey.ac.nz/~pscowper/ts/pdo.txt" 
pdo.dat <- read.table(www, header = T) 

attach (pdo.dat) 

pdo.ts <- ts( PDO, st = c(1900, 1), end = c(2007, 11), fr = 12) 


This analysis suggests that a FARIMA model might be suitable for modelling 
the PDO and for generating future climate scenarios. 


9.6.5 Bank loan rate 


The data in mprime.txt are the monthly percentage US Federal Reserve Bank 
prime loan rate,!° courtesy of the Board of Governors of the Federal Reserve 
System, from January 1949 until November 2007. We will plot the time series, 
the correlogram, and a spectrum on a logarithmic scale (Fig. 9.12). 


10 Data downloaded from Federal Reserve Economic Data at the Federal Reserve 
Bank of St. Louis. 
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www <- "http://www.massey.ac.nz/~pscowper/ts/mprime.txt" 
intr.dat <- read.table(www, header = T) 

attach (intr.dat) 

layout (1:3) 

plot (as.ts(Interest), ylab = ‘Interest rate!) 

acf (Interest) 

spectrum(Interest, span = sqrt(length(Interest)) / 4) 


The height of the spectrum increases as the frequency tends to zero (Fig. 
9.12). This feature is similar to that observed in the spectrum of the PDO 
series in §9.6.5 and is again indicative of long-term memory, although it is less 
pronounced in the loan rate series. In §8.4.3, we found that the estimate of the 
fractional differencing parameter was close to 0 and that the apparent long 
memory could be adequately accounted for by high-order ARMA models. 
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Fig. 9.12. 
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Federal Reserve Bank loan rates: (a) time plot; (b) correlogram; (c) spec- 


9.7 Discrete Fourier transform (DFT)* 


'The theoretical basis for spectral analysis can be described succinctly in terms 
of the discrete Fourier transform (DFT). The DFT requires the concept of 


9.7 Discrete Fourier transform (DFT)* 191 


complex numbers and Euler’s formula for a complex sinusoid, but the theory 
then follows nicely. In R, complex numbers are handled by typing i following, 
without a space, a numerical value; for example, 


> zi <- 2 + (043i) 
> z2 <- -1 - (041i) 
> z1 - 22 


[1] 3+4i 
> zi * z2 
[1] 1-5i 


> abs(z1) 


[1] 3.61 


Euler’s formula for a complex sinusoid is 

e = cos(0) + isin(0) (9.11) 
If the circle in Figure 9.1 is at the centre of the complex plane, c? is the point 
along the circumference. This remarkable formula can be verified using Taylor 
expansions of e'?, sin(0), and cos(6). 

The DFT is usually calculated using the fast fourier transform algorithm 
(FFT), which is very efficient for long time series. The DFT of a time series of 
length n, {xs :t=0,...,n—1}, and its inverse transform (IDFT) are defined 
by Equation (9.12) and Equation (9.13), respectively. 


n-1 
Xy ey quu ms auch nc (9.12) 
t=0 
1 n-1 
z=- ae t—0,...,n-1 (9.13) 


It is convenient to start the time series at t = 0 for these definitions be- 
cause m then corresponds to frequency 27m/n radians per sampling interval. 
The steps in the derivation of the DFT-IDFT transform pair are set out in 
Exercise 5. The DFT is obtained in R with the function fft O, where x[t+1] 
corresponds to x; and X[m*1] corresponds to X,,. 


> set.seed(1) 
>n <- 8 

> x <- rnorm(n) 
> X 


[1] -0.626 0.184 -0.836 1.595 0.330 -0.820 0.487 0.738 


> X <- fft(x) 
> X 
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[1] 1.052+0.000i -0.85240.007i 0.051+2.970i -1.060-2.639i 
[5] -2.342+0.000i -1.06042.639i 0.051-2.970i -0.852-0.007i 


> fft(X, inverse = TRUE)/n 


[1] -0.626-0i 0.184+0i -0.836-0i 1.595-0i 0.330+0i -0.820-0i 
[7] 0.487+0i 0.738+0i 


The complex form of Parseval’s Theorem, first given in Equation (9.4), is 


x ay = D |Xm|’/n (9.14) 


If n is even, the |X,,|? contribution to the variance corresponds to a frequency 
of 2mm/n for m = 1,...,n/2. For m = n/2,...,(n — 1), the frequencies 
are greater than the Nyquist frequency, 7, and are aliased to the frequencies 
2n(m —n)/n, which lie in the range [—7, —27/n]. All but two of the Xm occur 
as complex conjugate pairs; that is, X,_; = Xj for j = 1,.. .,n/2 — 1. The 
following lines of R code give the spikes of the Fourier line spectrum FL at 
frequencies in frq scaled so that FL[1] is mnean(x) ^2 and the sum of FL[2], 
., FL[n/2*1] is(n-1) *var (x) /n. 


> fq <- 2 * pi/n 

> frq <- 0 

> FL <- 0 

> FL [1] <- X[1]^2 / n^2 
> frq[1] <- 0 

> for ( j in 2:(0/2) ) { 


FL [j] <- 2 * (X[j] * X[n+2-j]) / n°2 
frq[j] <- fq * (j-1) 
} 
> FL [n/2 + 1] <- X[n/2 + 1]*2 / n°2 
> frq[n/2 + 1] <- pi 


If a plot is required, plot(frq,FL) can be used. You can now average spikes 
as you wish to obtain a spectrum (Exercise 5). 


9.8 The spectrum of a random process* 


Although we can now calculate the spectrum of a time series of finite length, 
we have no algebraic formula for defining the spectrum of the underlying 
random process. The definition of the spectrum of a random process follows 
from considering the expected value of a smoothed periodogram and is 


mios n ype tek —mT«uw«m (9.15) 


k-—oo 


The derivation of Equation (9.15) is given in 89.8.3. 
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9.8.1 Discrete white noise 


The spectrum of discrete white noise with variance c? is easily obtained from 
the definition since the only non-zero value of ^; is o? when k = 0. 


2 


m 


I(w) —mT«uw«m (9.16) 


The area under the spectrum is the variance o?. 


9.8.2 AR 


The spectrum of an ARMA(p, q) process is 


1 2 
1 E Pha Biete 
1+ $5 oe e 


g? 
2m 


I(w) 


T«Uuw«m (9.17) 


It is far easier to derive Equation (9.17) from results we develop in Chapter 10, 
but we state the result here because it suggests another method of estimating 
the spectrum of a random process. 


9.8.3 Derivation of spectrum 


Assume that (zx; : t = 0,...,n — 1} is a time series with mean 0. The contri- 
bution to the Fourier line spectrum at frequency m is 


n—1i n—1 
[qup = Xp} X (m) = No gp Hu NC Lg mem (9.18) 
t=0 s=0 


which can be rewritten as 


n—1n-—1 


5 5 pager eH (9.19) 


t=0 s=0 


We now substitute k = s — t and change the variables to t using k instead of 
s. Then the double sum becomes 


n—1in-—1-t 


5 5 Liteke TEn (9.20) 


t=0 k=-t 


Since the mean of the time series is 0, the sum of £t£t+k is proportional to 
the sample autocovariance at lag k, so Equation (9.20) can be written as 


n—1 


Sw d (9.21) 


k=—(n-1) 
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This will not converge as n tends to infinity because as n increases we introduce 
more spikes into the spectrum. However, if we take the expected value of 
Equation (9.21), and let n — oo so that 


E ex] > Yk (9.22) 


and define 
lim —— =w (9.23) 


Equation (9.15) follows. The factor 1/27 in Equation (9.15) is a normalising 
factor that ensures that the area under the spectrum equals the variance of 
the stochastic process. 


9.9 Autoregressive spectrum estimation 


Another method for estimating the spectrum from a time series is to fit a 
suitable ARMA(p, q) model and then use Equation (9.17) to calculate the 
corresponding spectrum. It is usual to use a high order AR(p) model rather 
than the more general ARMA model, and this is an option with spectrum that 
is invoked by including method=c("ar") as an argument in the function. It 
gives a rather smooth estimate of the spectrum, increasingly so as p becomes 
smaller; it is used below on the wave tank data. The function determines a 
suitable order for the AR(p) model using the AIC; the span parameter is not 
needed. 


> spectrum( waveht, log = c("no"), method = c("ar") ) 


'The smooth shape is useful for qualitative comparisons with the sea spectra 
(Fig. 9.13). The analysis also indicates that we could use an AR(13) model 
to obtain realisations of time series with this same spectrum in computer 
simulations. A well-chosen probability distribution for the errors could be 
used to give a realistic simulation of extreme values in the series. 


9.10 Finer details 


9.10.1 Leakage 


Suppose a time series is a sampled sine function at a specific frequency. If this 
frequency corresponds to one of the frequencies in the finite Fourier series, 
then there will be a spike in the Fourier line spectrum at this frequency. This 
coincidence is unlikely to arise by chance, so now suppose that the specific 
frequency lies between two of the frequencies in the finite Fourier series. There 
will not only be spikes at these two frequencies but also smaller spikes at 
neighbouring frequencies (Exercise 6). This phenomenon is known as leakage. 
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Fig. 9.13. Wave elevation series: spectrum calculated from fitting an AR model. 


9.10.2 Confidence intervals 


Consider a frequency wo corresponding to a spike of the Fourier line spec- 
trum. If we average an odd number, L, of scaled spikes to obtain a smoothed 


spectrum, then 
1 (L-1)/2 


C(wo) = T ER Crp(wr) (9.24) 


where Crp are the raw periodogram, scaled spike estimates. Now taking the 
expectation of both sides of Equation (9.24), and assuming the raw peri- 
odogram is unbiased for the population spectrum, we obtain 


1 (L—1)/2 
E (Co) =F 3 Tla) (9.25) 
l=—(L—1)/2 
Provided the population spectrum does not vary much over the interval 
[79 -0/2; wr 1)2]; 


But, notice that if wo corresponds to a peak or trough of the spectrum, the 
smoothed spectrum will be biased low or high. The more the smoothing, the 
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more the bias. However, some smoothing is essential to reduce the variability. 
The following heuristic argument gives an approximate confidence interval for 
the spectrum. If we divide both sides of Equation (9.24) by T (wo) and take 
the variance, we obtain 


(L—1)/2 


Var [C(w)/Two)]* z; — 0; — VariCneGa)/ (20) (9.27) 
I--(L—1)/2 


where we have used the fact that spikes in the Fourier line spectrum are 
independent — a consequence of Parseval's Theorem. Now each spike is an 
estimate of variance at frequency w; based on 2 degrees of freedom. So, 


2Cnp(wi) "m 2 


Ea x$ (9.28) 


'The variance of a chi-square distribution is twice its degrees of freedom. Hence, 


1 
Var [C (wo) / D'(wo)] ~ I (9.29) 
A scaled sum of L chi-square variables, each with 2 degrees of freedom, is a 
scaled chi-square variable with 2L degrees of freedom and well approximated 
by a normal distribution. Thus an approximate 9596 confidence interval for 


Fw) is 
(1 - =) GU. (: + =) ce (9.30) 


We have dropped the subscript on w because the result remains a good ap- 
proximation for estimates of the spectrum interpolated between C(uwi). 


9.10.3 Daniell windows 


'The function spectrum uses a modified Daniell window, or smoother, that 
gives half weight to the end values. If more than one number is specified for 
the parameter span, it will use a series of Daniell smoothers, and the net result 
will be a centred moving average with weights decreasing from the centre. The 
rationale for using a series of smoothers is that it will decrease the bias. 


9.10.4 Padding 


'The simplest FFT algorithm assumes that the time series has a length that is 
some power of 2. A positive integer is highly composite if it has more divisors 
than any smaller positive integer. The FFT algorithm is most efficient when 
the length n is highly composite, and by default spec.pgram pads the mean 
adjusted time series with zeros to reach the smallest highly composite number 
that is greater than or equal to the length of the time series. Padding can be 
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avoided by setting the parameter fast=FALSE. A justification for padding is 
that the length of the time series is arbitrary and that adding zeros has no 
effect on the frequency composition. Adding zeros does reduce the variance, 
and this must be remembered when scaling the spectrum, so that its area 
equals the variance of the original time series. 


9.10.5 Tapering 


The length of a time series is not usually related to any underlying frequency 
composition. However, the discrete Fourier series keeps replicating the original 
time series as —oo « t < oo, known as periodic extension of the original time 
series, and there will usually be a jump between the end of one replicate time 
series and the start of the next. These jumps can be avoided by reducing 
the magnitude of the values of the time series, relative to its mean, at the 
beginning and towards the end. The default with spectrum is a taper applied 
to 10% of the data at the beginning and towards the end of the time series. 
Tapering increases the variance of Fourier line spectrum spikes but reduces 
the bias (Exercise 7). It will also reduce the variance of the time series. The 
default proportion of data to which the taper is applied can be changed with 
the parameter taper. The fft function does not remove the mean, remove 
a linear trend, or apply a taper, operations that are generally classed as pre- 
processing. 


9.10.6 Spectral analysis compared with wavelets 


Spectral analysis is appropriate for the analysis of stationary time series and 
for identifying periodic signals that are corrupted by noise. Spectral analy- 
sis can be used for spatial series such as surface roughness transects, and 
two-dimensional spectral analysis can be used for measurements of surface 
roughness made over a plane. However, spectral analysis is not suitable for 
non-stationary applications. 

In contrast, wavelets have been developed to summarise the variation in 
frequency composition through time or over space. There are many applica- 
tions, including compression of digital files of images and in speech recognition 
software. Nason (2008) provides an introduction to wavelets using the R pack- 
age WaveThresh4. 


9.11 Summary of additional commands used 
spectrum returns the spectrum 


spec.pgam returns the spectrum with more control of parameters 
fft returns the DFT 
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9.12 Exercises 


1. Refer to §9.3.1 and take n = 128. 
a) Use R to calculate cos(27t/n), sin(2at/n), and cos(4nt/n) for t = 
1,...,n. Calculate the three variances and the three correlations. 
b) Assuming the results above generalise, provide an explanation for Par- 
seval’s Theorem. 
c) Explain why the A? j2 term in Equation (9.4) is not divided by 2. 


2. Repeat the investigation of realisations from AR processes in $9.4 using 
random deviates from an exponential distribution with parameter 1 and 
with its mean subtracted, rather than the standard normal distribution. 


3. The differential equation for the oscillatory response x of a lightly damped 
single mode of vibration system, such as a mass on a spring, with a forcing 
term w is 

£--200$-- r= w 


where Ç is the damping coefficient, which must be less than 1 for an 
oscillatory response, and 2 is the natural frequency. Approximate the 
derivatives by backward differences: 


Ë = Ly — 221.1 + T2 £ = Tt — T1 


and set w = w, and rearrange to obtain the form of the AR(2) process in 
§8.4.4. Consider an approximation using central differences. 


4. Suppose that 
n-1 
te= cy age! ë m=0,...,n—1 (9.31) 
m=0 


for some coefficients am that we wish to determine. Now multiply both 
sides of this equation by e-?75*/^ and sum over t from 0 to n— 1 to obtain 


n—1i n—1mn-—1 
y ge 2riit/n Y X Ay e (m-3)t/n (9.32) 
t=0 t=0 m=0 


Consider a fixed value of j. Notice that the sum to the right of am is 
a geometric series with sum 0 unless m = j. This is Equation (9.12) 
expressed it terms of na; in place of Xm with a factor of n. 


5. Write R code to average an odd number of spike heights obtained from 
fft and hence plot a spectrum. 
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6. Sample the three signals 
a) sin(at/2) 
b) sin(37t/4) 
c) sin(5zt/8) 
at times t= 0,..., 7, using fft to compare their line spectra. 


7. Sample the signal sin(1171/32) for t = 0,...,31. Use fft to calculate the 


Fourier line spectrum. The cosine bell taper applied to the beginning « 
and ending a of a series is defined by 


E — cos Ge fe 0.5}/{an}) | a (+1) <an 
[1 — cos (rt BP 0.5}/{an}) | re CRIS Gea 


Investigate the effect of this taper, with a = 0.1, on the Fourier line 
spectrum of the sampled signal. 


8. Sea spectra are sometimes modelled by the Peirson-Moskowitz spectrum, 
which has the form below and is usually only appropriate for deep water 
conditions. 

I(w)-— daj eho O<w<a 


Plot the Peirson-Moskowitz spectrum in R for a few choices of parameters 
a and b. Compare it with the wave elevation spectra (Fig. 9.7). 
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System Identification 


10.1 Purpose 


Vibration is defined as an oscillatory movement of some entity about an equi- 
librium state. It is the means of producing sound in musical instruments, it 
is the principle underlying the design of loudspeakers, and it describes the 
response of buildings to earthquakes. The squealing of disc brakes on a car 
is caused by vibration. The up and down motion of a ship at sea is a low- 
frequency vibration. Spectral analysis provides the means for understanding 
and controlling vibration. 

Vibration is generally caused by some external force acting on a system, 
and the relationship between the external force and the system response can 
be described by a mathematical model of the system dynamics. We can use 
spectral analysis to estimate the parameters of the mathematical model and 
then use the model to make predictions of the response of the system under 
different forces. 


10.2 Identifying the gain of a linear system 


10.2.1 Linear system 


We consider systems that have clearly defined inputs and outputs, and aim 
to deduce the system from measurements of the inputs and outputs or to 
predict the output knowing the system and the input. Attempts to under- 
stand economies and to control inflation by increasing interest rates provide 
ambitious examples of applications of these principles. 

A mathematical model of a dynamic system is linear if the output to a 
sum of input variables, x and y, equals the sum of the outputs corresponding 
to the individual inputs. More formally, a mathematical operator £ is linear 
if it satisfies 

L (ax + by) = aLl(x) + b£(y) 
P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 201 


Use R, DOI 10.1007/978-0-387-88698-5_10, 
© Springer Science+Business Media, LLC 2009 
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where a and b are constants. For a linear system, the output response to a 
sine wave input is a sine wave of the same frequency with an amplitude that 
is proportional to the amplitude of the input. The ratio of the output ampli- 
tude to the input amplitude, known as the gain, and the phase lag between 
input and output depend on the frequency of the input, and this dependence 
provides a complete description of a linear system. 

Many physical systems are well approximated by linear mathematical mod- 
els, provided the input amplitude is not excessive. In principle, we can identify 
a linear model by noting the output, commonly referred to as the response, 
to a range of sine wave inputs. But there are practical limitations to such a 
procedure. In many cases, while we may be able to measure the input, we 
certainly cannot specify it. Examples are wave energy devices moored at sea, 
and the response of structures to wind forcing. Even when we can specify the 
input, recording the output over a range of frequencies is a slow procedure. In 
contrast, provided we can measure the input and output, and the input has 
a sufficiently broad spectrum, we can identify the linear system from spectral 
analysis. Also, spectral methods have been developed for non-linear systems. 

A related application of spectral analysis is that we can determine the 
spectrum of the response if we know the system and the input spectrum. 
For example, we can predict the output of a wave energy device if we have 
a mathematical model for its dynamics and know typical sea spectra at its 
mooring. 


10.2.2 Natural frequencies 


If a system is set in motion by an initial displacement or impact, it may oscil- 
late, and this oscillation takes place at the natural frequency (or frequencies) 
of the system. A simple example is the oscillation of a mass suspended by 
a spring. Linear systems have large gains at natural frequencies and, if large 
oscillations are undesirable, designers need to ensure that the natural frequen- 
cies of the system are far removed from forcing frequencies. Alternatively, in 
the case of wave energy devices, for example, the designer may aim for the 
natural frequencies of the device to match predominant frequencies in the sea 
spectrum. A common example of forcing a system at its natural frequency is 
pushing a child on a swing. 


10.2.3 Estimator of the gain function 


If a linear system is forced by a sine wave of amplitude A at frequency f, 
the response has an amplitude G(f)A, where G(f) is the gain at frequency 
f. The ratio of the variance of the output to the variance of the input, for 
sine waves at this frequency, is G(f)?. If the input is a stationary random 
process rather than a single sine wave, its variance is distributed over a range 
of frequencies, and this distribution is described by the spectrum. It seems 
intuitively reasonable to estimate the square of the gain function by the ratio 
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of the output spectrum to the input spectrum. Consider a linear system with 
a single input, £+, and a single output, y+. The gain function can be estimated 
by 


60) =f Gal 


A corollary is that the output spectrum can be estimated if the gain func- 
tion is known, or has been estimated, and the input spectrum has been esti- 
mated by 


(10.1) 


Cyy = GO, (10.2) 


Equation (10.2) also holds if spectra are expressed in radians rather than 
cycles, in which case the gain is a function G(w) of w. 


10.3 Spectrum of an AR(p) process 


Consider the deterministic part of an AR(p) model with a complex sinusoid 
input, 
Ly — 19,4 —..« — Aplin = eit (10.3) 


Assume a solution for 2; of the form A eet, where A is a complex number, 
and substitute this into Equation (10.3) to obtain 


A= (1 aye E age ey 


(10.4) 
The gain function, expressed as a function of w, is the absolute value of A. Now 
consider a discrete white noise input, w+, in place of the complex sinusoid. The 
system is now an AR(p) process. Applying Equation (10.2), with population 
spectra rather than sample spectra, and noting that the spectrum of white 
noise with unit variance is 1/7 (§9.8.1), gives 


1 : qwe 
Iw) = |A}? Dow = — (1 — aye™™” — ... — oge T?) 5 Swer 
T 
(10.5) 
The deterministic part of an AR(p) model is a linear difference equation of 
order p. 


10.4 Simulated single mode of vibration system 


The simplest linear model (SI units in parentheses) for a vibrating system 
is that of a mass m (N) on a spring of stiffness k (Nm !) with a damper 
characterised by a damping coefficient c (Nsm~'). Denote the displacement 
of the mass by y, differentiation with respect to time by a dot placed above 
it, and the forcing term by z. If we apply Newton's second law of motion, 
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equating the product of mass and acceleration with the forces acting on the 
mass, we obtain: 
my +cy+ky=2 (10.6) 


Equation (10.6) has the same form as that given in Exercise 1: the undamped 
natural frequency is \/(k/m) and the damping coefficient is c/(2,\/{km}). The 
gain is given by 


[vr 


G(w) = [(k — mw*)? + °w (10.7) 


and the damped natural frequency, which corresponds to the maximum gain, 


is given by 
k c 
— | 1 — — . 
m ( i) gu) 


These results can be derived by substituting x = sin(wt) and y = Gsin(wt—w) 
into Equation (10.6) (also see Exercise 1). 

Equation (10.6) represents a single mode of vibration because there is a 
single mass that is constrained to move in a straight line without rotating. The 
equation is a good model for a pendulum making small oscillations. It might 
be a reasonable model for vibration of a street lamp on a metal pole in gusts 
of wind from the same direction. It would be a poor model for vibration of a 
violin string because it could only describe the fundamental mode shape and 
would miss all of the harmonics and overtones. Nevertheless, Equation (10.6) 
is a widely used approximation in vibration analysis. Thomson (1993) is a 
nice introduction to the theory of vibration. 

If we take a time step of A (that is, a small fraction of the unit of time) we 
can approximate derivatives by backward differences. Thus Equation (10.6) 
can be approximated by the difference equation 


aye + d1yr-1 + a29i-2 = Tt (10.9) 
where 
| m c k: || 2m c. | m 
ke QR LM S qc E EP 


The following short R script investigates a difference equation approxima- 
tion to a lightly damped system represented by Equation (10.6) with m — 1, 
c = 1, and k = 16.25. The undamped natural frequency is 4.03 and the 
damping coefficient is 0.124, so the damped natural frequency is 4 radians 
per second, assuming time is measured in seconds. The maximum gain is ob- 
tained by substituting the damped natural frequency into Equation (10.7) and 
is 0.250. Equation (10.6) is approximated with Equation (10.9). The input x; 
is an AR(2) process with o; and a» set at 1 and —0.5, respectively, driven by 
Gaussian white noise with unit variance. The sampling rate is 100 per second, 
so the spectrum is defined from 0 to 50 Hz. The record length, n, is 100,000 and 
R calculates the spectrum at 50,000 points. The natural frequency is around 
0.64 Hz, so the gain function is only plotted up to a frequency of 5 Hz. The 
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arrays Freq, FreH, Omeg, and OmegH contain the discretized frequencies in cy- 
cles per sampling interval, Hz, radians per sampling interval, and radians per 
second, respectively. Gth is the theoretical gain of the linear system. Gemp is 
the empirical estimate of the gain calculated as the square root of the ratio 
of the output spectrum to the input spectrum. Gar is the theoretical gain of 
the difference equation approximation, and it is indistinguishable from the 
empirical estimate (Fig. 10.1). As the signals are noise-free this is not surpris- 
ing. You are asked to investigate the effects of adding noise to the input and 
output signals in Exercise 2. The empirical estimate of the gain identifies the 
natural frequency accurately but slightly underestimates the maximum gain, 
and you are asked to investigate possible reasons for this in Exercise 3. 


> m <- 1; c <- 1; k <- 16.25; Delta <- 0.01 
> a0 <- m / Delta^2 + c / Delta + k 
> al <- -2 * m / Delta^2 - c / Delta; a2 <- m / Delta^2 
> n «- 100000 
» y «- c(0, 0); x «- c(0, 0) 
> set.seed(1) 
> for (i in 3:n) { 
x[i] <- x[i-1] - 0.5 * x[i-2] + rnorm(1) 
yli] <- (-a1 * y[i-1] - a2 * y[i-2]) / a0 + x[i] / a0 
} 
Sxx <- spectrum(x, span = 31) 
Syy <- spectrum(y, span = 31) 
Gemp <- sqrt( Syy$spec[1:5000] / Sxx$spec[1:5000] ) 
Freq <- Syy$freq[1:5000] 
FreH <- Freq / Delta 
Omeg <- 2 * pi * Freq 
OmegH <- 2 * pi * FreH 
Gth <- sqrt( 1/( (k-m*O0megH^2)^2 + c^2*O0megH^2 )) 
Gar <- 1 / abs( 1 + ai/a0 * exp(-Omeg*1i) + a2/a0 * exp(-Omeg*2i) ) 
plot(FreH, Gth, xlab = "Frequency (Hz)", ylab = "Gain", type-"1") 
lines(FreH, Gemp, lty = "dashed") 
lines(FreH, Gar, lty = "dotted") 
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10.5 Ocean-going tugboat 


'The motion of ships and aircraft is described by displacements along the or- 
thogonal z, y, and z axes and rotations about these axes. The displacements 
are surge, sway, and heave along the x, y, and z axes, respectively. The ro- 
tations about the z, y, and z axes are roll, pitch, and yaw, respectively (Fig. 
10.2). So, there are six degrees of freedom for a ship's motion in the ocean, 
and there are six natural frequencies. However, the natural frequencies will 
not usually correspond precisely to the displacements and rotations, as there 
is a coupling between displacements and rotations. This is typically most pro- 
nounced between heave and pitch. There will be a natural frequency with 
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Fig. 10.1. Gain of single-mode linear system. The theoretical gain is shown by 
a solid line and the estimate made from the spectra obtained from the difference 
equation is shown by a broken line. The theoretical gain of the difference equation 
is plotted as a dotted line and coincides exactly with the estimate. 


a corresponding mode that is predominantly heave, with a slight pitch, and 
another natural frequency that is predominantly pitch, with a slight heave. 

Naval architects will start with computer designs and then proceed to 
model testing in a wave tank before building a prototype. They will have a 
good idea of the frequency response of the ship from the models, but this will 
have to be validated against sea trials. Here, we analyse some of the data from 
the sea trials of an ocean-going tugboat. The ship sailed over an octagonal 
course, and data were collected on each leg. There was an impressive array 
of electronic instruments and, after processing analog signals through anti- 
aliasing filters, data were recorded at 0.5s intervals for roll (degrees), pitch 
(degrees), heave (m), surge (m), sway (m), yaw (degrees), wave height (m), 
and wind speed (knots). 


> www <- "http://www.massey.ac.nz/~pscowper/ts/leg4.dat" 

> tug.dat <- read.table(www, header = T) 

> attach(tug.dat) 

> Heave.spec <- spectrum( Heave, span = sqrt( length(Heave) ), 
log = c("no"), main = "" ) 
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Fig. 10.2. Orthogonal axes for describing motion of a ship. Heave and pitch are 
shown by block arrows. 


> Wave.spec <- spectrum( Wave, span = sqrt( length(Heave) ), 
log = c("no"), main = "" ) 


> G <- sqrt (Heave.spec$spec/Wave.spec$spec) 

> par(mfcol = c(2, 2)) 

> plot( as.ts(Wave) ) 

> acf (Wave) 

> spectrum(Wave, span = sqrt(length(Heave)), log = c("no"), main = "") 
> plot (Heave.spec$freq, G, xlab-"frequency Hz", ylab="Gain", type="1") 


Figure 10.3 shows the estimated wave spectrum and the estimated gain 
from wave height to heave. The natural frequencies associated with the 
heave/pitch modes are estimated as 0.075 Hz and 0.119 Hz, and the cor- 
responding gains from wave to heave are 0.15179 and 0.1323. In theory, the 
gain will approach 1 as the frequency approaches 0, but the sea spectrum has 
negligible components very close to 0, and no sensible estimate can be made. 
Also, the displacements were obtained by integrating accelerometer signals, 
and this is not an ideal procedure at very low frequencies. 


10.6 Non-linearity 


There are several reasons why the hydrodynamic response of a ship will not 
be precisely linear. In particular, the varying cross-section of the hull accounts 


208 10 System Identification 


[e] 
E = 
g El 
e $ 2- 
[7] 
"i 4 4 
os 
T T T T T T T eic T T T T T 
0 500 1500 2500 0.0 01 02 03 04 O05 
Time Frequency 
bandwidth = 0.0052 
O | [re] 
x [2 
o X 
S RN 
tool es eil lis au- alll o 5 
[2] | IPS o. 
[e] 
io 
o 4 Q 
1 o 
T T T T e I T T T T T 
0 5 10 20 30 0.0 01 02 03 04 O05 
Lag Frequency Hz 


Fig. 10.3. Gain of heave from wave. 


for non-linear buoyancy forces. Metcalfe et al. (2007) investigate this by fitting 
a regression of the heave response on lagged values of the response, squares, 
and cross-products of these lagged values, wave height, and wind speed. The 
probing method looks at the response of the fitted model to the sum of two 
complex sinusoids at frequencies wı and w2. The non-linear response can be 
shown as a three-dimensional plot of the gain surface against frequency wi 
and w2 or by a contour diagram. However, in this particular application the 
gain associated with the non-linear terms was small compared with the gain of 
the linear terms (Metcalfe et al., 2007). This is partly because the model was 
fitted to data taken when the ship was in typical weather conditions — under 
extreme conditions, when capsizing is likely, linear models are inadequate. 


10.7 Exercises 


1. The differential equation that describes the motion of a linear system with 
a single mode of vibration, such as a mass on a spring, has the general 
form 

j--2009-- (yx 
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The parameter f2 is the undamped natural frequency, and the parameter 
¢ is the damping coefficient. The response is oscillatory if ¢ « 1. 

a) Refer to Equation 10.7 and express ¢ and 2 in terms of m, c, and k. 

b) Suppose there is no forcing term (x = 0), assume that y = e"'*, and 
substitute into the general form of the differential equation. Show 
that m = —C€2  i/[Q?(1 — ¢?)]. The damped natural frequency is 
QJ - Q). 

c) Take the initial condition of the unforced system as y = 1 when t = 0. 
Find the solution for y, and explain why this is referred to as the 
transient response. 

d) Now consider a periodic forcing term x = e*t. Write the steady state 
response, y, as y = Ae'(***9), Substitute into the general form of the 
differential equation and show that 


A= (2? — w? 4 4th, 09) 1? 
26wf2 
tan({2) = 9278 


. Refer to the R script in 810.4, which compares a difference equation ap- 
proximation to a model of a mass vibrating on a spring with the theoretical 
results. Insert another loop after that in lines 6-10 to simulate measure- 
ment noise added to the input x and y: 


for (i in 1:n) 1 
x[i] «- x[i] * nax * rnorm(1) 
yli] <- yli] + nay * rnorm(1) 
} 


Note that you also need to specify numerical values for the noise ampli- 

tudes, nax and nay, earlier in your script. 

a) Why does the addition of noise need to be put in a separate loop? 

b) How does the addition of white measurement noise to the output, but 
not the input, affect the estimate of the spectrum? 

c) How does the addition of independent white measurement noise to 
both input and output affect the estimate of the spectrum? 


. The difference equation approximation used in §10.4 underestimates the 

maximum gain. 

a) Investigate the effect of the span parameter. 

b) Investigate the effect of increasing the sampling rate to 1000 per sec- 
ond. 

c) Investigate the effect of using a centred difference approximation to 
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Multivariate Models 


11.1 Purpose 


Data are often collected on more than one variable. For example, in economics, 
daily exchange rates are available for a large range of currencies, or, in hydro- 
logical studies, both rainfall and river flow measurements may be taken at a 
site of interest. In Chapter 10, we considered a frequency domain approach 
where variables are classified as inputs or outputs to some system. In this 
chapter, we consider time domain models that are suitable when measure- 
ments have been made on more than one time series variable. We extend the 
basic autoregressive model to the vector autoregressive model, which has more 
than one dependent time series variable, and look at methods in R for fitting 
such models. We consider series, called cointegrated series, that share an un- 
derlying stochastic trend, and look at suitable statistical tests for detecting 
cointegration. Since variables measured in time often share similar properties, 
regression can be used to relate the variables. However, regression models of 
time series variables can be misleading, so we first consider this problem in 
more detail before moving on to suitable models for multivariate time series. 


11.2 Spurious regression 


It is common practice to use regression to explore the relationship between 
two or more variables, and we usually seek predictor variables that either 
directly cause the response or provide a plausible physical explanation it. For 
time series variables we have to be particularly careful before ascribing any 
causal relationship since an apparent relationship could exist due to common 
extraneous factors that give rise to an underlying trend or simply because both 
series exhibit seasonal fluctuations. For example, the Australian electricity 
and chocolate production series share an increasing trend (see the following 
code) due to an increasing Australian population, but this does not imply that 
changes in one variable cause changes in the other. 


P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 211 
Use R, DOI 10.1007/978-0-387-88698-5_11, 
© Springer Science+Business Media, LLC 2009 
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plot (as.vector (aggregate(Choc.ts)), as.vector (aggregate (Elec.ts))) 
cor (aggregate(Choc.ts), aggregate(Elec.ts)) 


> www <- "http://www.massey.ac.nz/^pscowper/ts/cbe.dat" 
> CBE <- read.table(www, header = T) 

> Elec.ts <- ts(CBE[, 3], start = 1958, freq = 12) 

> Choc.ts <- ts(CBE[, 1], start = 1958, freq = 12) 

> 

> 


[1] 0.958 


The high correlation of 0.96 and the scatter plot do not imply that the elec- 
tricity and chocolate production variables are causally related (Fig. 11.1). In- 
stead, it is more plausible that the increasing Australian population accounts 
for the increasing trend in both series. Although we can fit a regression of 
one variable as a linear function of the other, with added random variation, 
such regression models are usually termed spurious because of the lack of any 
causal relationship. In this case, it would be far better to regress the variables 
on the Australian population. 
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Fig. 11.1. Annual electricity and chocolate production plotted against each other. 


The term spurious regression is also used when underlying stochastic 
trends in both series happen to be coincident, and this seems a more appro- 
priate use of the term. Stochastic trends are a feature of an ARIMA process 
with a unit root (i.e., B = 1 is a solution of the characteristic equation). We 
illustrate this by simulating two independent random walks: 
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> set.seed(10); x <- rnorm(100); y <- rnorm(100) 
> for(i in 2:100) { 

x[i] «- x[i-1] + rnorm(1) 

yli] «- y[i-1] + rnorm(1) } 


> plot(x, y) 
> cor(x, y) 
[1] 0.904 


'The code above can be repeated for different random number seeds though 
you will only sometimes notice spurious correlation. The seed value of 10 was 
selected to provide an example of a strong correlation that could have resulted 
by chance. The scatter plot shows how two independent time series variables 
might appear related when each variable is subject to stochastic trends (Fig. 
11.2). 
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Fig. 11.2. The values of two independent simulated random walks plotted against 
each other. (See the code in the text.) 


Stochastic trends are common in economic series, and so considerable care 
is required when trying to determine any relationships between the variables 
in multiple economic series. It may be that an underlying relationship can be 
justified even when the series exhibit stochastic trends because two series may 
be related by a common stochastic trend. 

For example, the daily exchange rate series for UK pounds, the Euro, and 
New Zealand dollars, given for the period January 2004 to December 2007, 
are all per US dollar. The correlogram plots of the differenced UK and EU 
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series indicate that both exchange rates can be well approximated by random 
walks (Fig. 11.3), whilst the scatter plot of the rates shows a strong linear 
relationship (Fig. 11.4), which is supported by a high correlation of 0.95. Since 
the United Kingdom is part of the European Economic Community (EEC), 
any change in the Euro exchange rate is likely to be apparent in the UK 
pound exchange rate, so there are likely to be fluctuations common to both 
series; in particular, the two series may share a common stochastic trend. We 
will discuss this phenomenon in more detail when we look at cointegration in 
§11.4. 


> www <- "http://www.massey.ac.nz/~pscowper/ts/us_rates.dat" 
> xrates <- read.table(www, header = T) 
> xrates[1:3, ] 


UK NZ EU 
1 0.558 1.52 0.794 
2 0.553 1.49 0.789 
3 0.548 1.49 0.783 


acf( diff(xrates$UK) ) 

acf( diff(xrates$EU) ) 
plot(xrates$UK, xrates$EU, pch = 4) 
cor(xrates$UK, xrates$EU) 

1] 0.946 
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11.3 Tests for unit roots 


When investigating any relationship between two time series variables we 
should check whether time series models that contain unit roots are suitable. 
If they are, we need to decide whether or not there is a common stochastic 
trend. The first step is to see how well each series can be approximated as 
a random walk by looking at the correlogram of the differenced series (e.g., 
Fig. 11.3). Whilst this may work for a simple random walk, we have seen in 
Chapter 7 that stochastic trends are a feature of any time series model with 
a unit root B = 1 as a solution of the characteristic equation, which would 
include more complex ARIMA processes. 

Dickey and Fuller developed a test of the null hypothesis that a = 1 against 
an alternative hypothesis that a < 1 for the model x; = oz,.., + uz in which 
uz is white noise. A more general test, which is known as the augmented 
Dickey-Fuller test (Said and Dickey, 1984), allows the differenced series u: 
to be any stationary process, rather than white noise, and approximates the 
stationary process with an AR model. The method is implemented in R by 
the function adf.test within the tseries library. The null hypothesis of a 
unit root cannot be rejected for our simulated random walk x: 


> library(tseries) 
> adf.test(x) 
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Fig. 11.3. Correlograms of the differenced exchange rate series: (a) UK rate; (b) 
EU rate. 


Augmented Dickey-Fuller Test 


data: x 
Dickey-Fuller - -2.23, Lag order - 4, p-value - 0.4796 
alternative hypothesis: stationary 


This result is not surprising since we would only expect 5% of simulated 
random walks to provide evidence against a null hypothesis of a unit root 
at the 596 level. However, when we analyse physical time series rather than 
realisations from a known model, we should never mistake lack of evidence 
against a hypothesis for a demonstration that the hypothesis is true. The test 
result should be interpreted with careful consideration of the length of the 
time series, which determines the power of the test, and the general context. 
The null hypothesis of a unit root is favoured by economists because many 
financial time series are better approximated by random walks than by a 
stationary process, at least in the short term. 

An alternative to the augmented Dickey-Fuller test, known as the Phillips- 
Perron test (Perron, 1988), is implemented in the R function pp.test. The 
distinction between the two tests is that the Phillips-Perron procedure esti- 
mates the autocorrelations in the stationary process u; directly (using a kernel 
smoother) rather than assuming an AR approximation, and for this reason 
the Phillips-Perron test is described as semi-parametric. Critical values of the 
test statistic are either based on asymptotic theory or calculated from exten- 
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Fig. 11.4. Scatter plot of the UK and EU exchange rates. Both rates are per US 
dollar. 


sive simulations. There is no evidence to reject the unit root hypothesis, so 
we conclude that the UK pound and Euro exchange rates are both likely to 
contain unit roots. 


> pp.test(xrates$UK) 


Phillips-Perron Unit Root Test 


data: xrates$UK 

Dickey-Fuller Z(alpha) = -10.6, Truncation lag parameter = 7, 
p-value = 0.521 

alternative hypothesis: stationary 


> pp.test (xrates$EU) 


Phillips-Perron Unit Root Test 


data: xrates$EU 

Dickey-Fuller Z(alpha) = -6.81, Truncation lag parameter = 7, 
p-value = 0.7297 

alternative hypothesis: stationary 


11.4 Cointegration 


11.4.1 Definition 


Many multiple time series are highly correlated in time. For example, in §11.2 
we found the UK pound and Euro exchange rates very highly correlated. This 
is explained by the similarity of the two economies relative to the US economy. 
Another example is the high correlation between the Australian electricity and 
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chocolate production series, which can be reasonably attributed to an increas- 
ing Australian population rather than a causal relationship. In addition, we 
demonstrated that two series that are independent and contain unit roots 
(e.g., they follow independent random walks) can show an apparent linear re- 
lationship, due to chance similarity of the random walks over the period of the 
time series, and stated that such a correlation would be spurious. However, 
as demonstrated by the analysis of the UK pounds and Euro exchange rates, 
it is quite possible for two series to contain unit roots and be related. Such 
series are said to be cointegrated. In the case of the exchange rates, a stochas- 
tic trend in the US economy during a period when the European economy is 
relatively stable will impart a common, complementary, stochastic trend to 
the UK pound and Euro exchange rates. We now state the precise definition 
of cointegration. 


As an example consider a random walk {u+} given by ju = Ht-1 We, 
where {w+} is white noise with zero mean, and two series {x+} and {y+} given 
by z, = p + w; , and y; = pu +H wy t, where (w; t} and {wyt} are independent 
white noise series with zero mean. Both series are non-stationary, but their 
difference (zx; — yt} is stationary since it is a finite linear combination of 
independent white noise terms. Thus the linear combination of {x+} and (yi), 
with a = 1 and b = —1, produced a stationary series, {Wz ; — wy). Hence (zi) 
and {y,} are cointegrated and share the underlying stochastic trend {4}. 

In R, two series can be tested for cointegration using the Phillips-Ouliaris 
test implemented in the function po.test within the tseries library. The 
function requires the series be given in matrix form and produces the results 
for a test of the null hypothesis that the two series are not cointegrated. As an 
example, we simulate two cointegrated series x and y that share the stochastic 
trend mu and test for cointegration using po.test: 


> x <- y <- mu <- rep(0, 1000) 

> for (i in 2:1000) mu[i] <- mu[i - 1] + rnorm(1) 
> x <- mu + rnorm(1000) 

> y <- mu + rnorm(1000) 

> adf.test(x)$p.value 


[1] 0.502 
> adf.test(y)$p.value 
[1] 0.544 
> po.test(cbind(x, y)) 


Phillips-Ouliaris Cointegration Test 
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data: cbind(x, y) 
Phillips-Ouliaris demeaned = -1020, Truncation lag parameter = 9, 
p-value = 0.01 


In the example above, the conclusion of the adf.test is to retain the null 
hypothesis that the series have unit roots. The po.test provides evidence 
that the series are cointegrated since the null hypothesis is rejected at the 1% 
level. 


11.4.2 Exchange rate series 


The code below is an analysis of the UK pound and Euro exchange rate 
series. The Phillips-Ouliaris test shows there is evidence that the series are 
cointegrated, which justifies the use of a regression model. An ARIMA model 
is then fitted to the residuals of the regression model. The ar function is used 
to determine the best order of an AR process. We can investigate the adequacy 
of our cointegrated model by using R to fit a more general ARIMA process to 
the residuals. The best-fitting ARIMA model has d = 0, which is consistent 
with the residuals being a realisation of a stationary process and hence the 
series being cointegrated. 


> po.test(cbind(xrates$UK, xrates$EU)) 


Phillips-Ouliaris Cointegration Test 


data: cbind(xrates$UK, xrates$EU) 
Phillips-Ouliaris demeaned = -21.7, Truncation lag parameter = 10, 
p-value = 0.04118 


ukeu.lm <- lm(xrates$UK ^ xrates$EU) 
ukeu.res <- resid(ukeu.1m) 
ukeu.res.ar <- ar(ukeu.res) 
ukeu.res.ar$order 


[1] 3 
> AIC(arima(ukeu.res, order = c(3, 0, 0))) 
[1] -9886 


> AIC(arima(ukeu.res, order = c(2, 0, 0))) 


[1] -9886 
> AIC(arima(ukeu.res, order = c(i, 0, 0))) 
[1] -9880 


> AIC(arima(ukeu.res, order = c(i, 1, 0))) 


[1] -9876 
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Comparing the AICs for the AR(2) and AR(3) models, it is clear there is 
little difference and that the AR(2) model would be satisfactory. The example 
above also shows that the AR models provide a better fit to the residual 
series than the ARIMA(1, 1, 0) model, so the residual series may be treated 
as stationary. This supports the result of the Phillips-Ouliaris test since a 
linear combination of the two exchange rates, obtained from the regression 
model, has produced a residual series that appears to be a realisation of a 
stationary process. 


11.5 Bivariate and multivariate white noise 


Two series {wz} and (wy) are bivariate white noise if they are stationary 
and their cross-covariance Yry(k) = Cov(w; t, Wy t+k) satisfies 


Yeslk) = 3g) = uu 5) = 0 for all k £0 (11.1) 


In the equation above, Ysa(0) = Yyy (0) = 1 and yzy(0) may be zero or non- 
zero. Hence, bivariate white noise series {Wz} and (wy,;) may be regarded as 
white noise when considered individually but when considered as a pair may 
be cross-correlated at lag 0. 

The definition of bivariate white noise readily extends to multivariate white 
noise. Let yij(k) = Cov(Wi, t, wj.) be the cross-correlation between the se- 
ries {w; +} and (w;4) (i,j =1,...n). Then stationary series {w1 t}, {wo}, ..., 
{wn} are multivariate white noise if each individual series is white noise and, 
for each pair of series (i # j), yi;(k) = 0 for all k # 0. In other words, multi- 
variate white noise is a sequence of independent draws from some multivariate 
distribution. 

Multivariate Gaussian white noise can be simulated with the rmvnorm 
function in the mvtnorm library. The function may take a mean and covari- 
ance matrix as a parameter input, and the dimensions of these determine the 
dimension of the output matrix. In the following example, the covariance ma- 
trix is 2 x 2, so the output variable x is bivariate with 1000 simulated white 
noise values in each of two columns. An arbitrary value of 0.8 is chosen for 
the correlation to illustrate the use of the function. 


> library (mvtnorm) 

> cov.mat <- matrix(c(1, 0.8, 0.8, 1), nr = 2) 
> w <- rmvnorm(1000, sigma = cov.mat) 

> cov(w) 


[,1] [,2] 
[1,] 1.073 0.862 
[2,] 0.862 1.057 


> wx <- wÍ, 1] 
> wy <- wl, 2] 
> ccf(wx, wy, main = "") 
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The ccf function verifies that the cross-correlations are approximately zero 
for all non-zero lags (Fig. 11.5). As an exercise, check that the series in each 
column of x are approximately white noise using the acf function. 

One simple use of bivariate or multivariate white noise is in the method 
of prewhitening. Separate SARIMA models are fitted to multiple time series 
variables so that the residuals of the fitted models appear to be a realisation 
of multivariate white noise. The SARIMA models can then be used to forecast 
the expected values of each time series variable, and multivariate simulations 
can be produced by adding multivariate white noise terms to the forecasts. 
The method works well provided the multiple time series have no common 
stochastic trends and the cross-correlation structure is restricted to the error 
process. 


ACF 


00 02 04 06 08 


Lag 


Fig. 11.5. Cross-correlation of simulated bivariate Gaussian white noise 


11.6 Vector autoregressive models 


Two time series, {x;} and (y), follow a vector autoregressive process of order 
1 (denoted VAR(1)) if 


Te = 011211 + O12Yt-1 T Wet 


Ye = 9104-1 + O22Yt—1 + Wy,t (11.2) 


where {wz,,} and {wy t} are bivariate white noise and ĝ;j are model param- 
eters. If the white noise sequences are defined with mean 0 and the process 
is stationary, both time series (z;) and {y,} have mean 0 (Exercise 1). The 
simplest way of incorporating a mean is to define {x+} and (yi) as deviations 
from mean values. Equation (11.2) can be rewritten in matrix notation as 


Zi = O74 + Wt (11.3) 
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Z, = Tt O= 011 015 wi = We,t 
Yt 051 025 Wy,t 
Equation (11.3) is a vector expression for an AR(1) process; i.e., the process 


is vector autoregressive. Using the backward shift operator, Equation (11.3) 
can also be written 


where 


(I— OB)Z, = 0(B)Z, = w; (11.4) 


where @ is a matrix polynomial of order 1 and I is the 2 x 2 identity matrix. 
A VAR(1) process can be extended to a VAR(p) process by allowing 0 to be 
a matrix polynomial of order p. A VAR(p) model for m time series is also 
defined by Equation (11.4), in which I is the m x m identity matrix, 0 is a 
polynomial of m x m matrices of parameters, Z, is an m x 1 matrix of time 
series variables, and w is multivariate white noise. For a VAR model, the 
characteristic equation is given by a determinant of a matrix. Analogous to 
AR models, a VAR(p) model is stationary if the roots of the determinant |0 (x)| 
all exceed unity in absolute value. For the VAR(1) model, the determinant is 
given by 


1- 11x —01»x 


bor 1— 8g ^ um — 0222) — 61202127 (11.5) 


The R functions polyroot and Mod can be used to test whether a VAR model 
is stationary, where the function polyroot just takes a vector of polynomial 
coefficients as an input parameter. For example, consider the VAR(1) model 


with parameter matrix © = (5 a Then the characteristic equation is 
given by 

1—-0.4» —0.3x | __ 2 

| osos 7 1 — 0.5x — 0.02x (11.6) 


'The absolute value of the roots of the equation is given by 
> Mod(polyroot(c(1, -0.5, -0.02))) 
[1] 1.86 26.86 


From this we can deduce that the VAR(1) model is stationary since both roots 
exceed unity in absolute value. 

The parameters of a VAR(p) model can be estimated using the ar function 
in R, which selects a best-fitting order p based on the smallest AIC. Using the 
simulated bivariate white noise process of §11.5 and the parameters from the 
stationary VAR(1) model given above, a VAR(1) process is simulated below 
and the parameters from the simulated series estimated using ar. 


> x <- y <- rep(0, 1000) 
> x[1] <- wx[1] 

> y[1] <- wy[1] 

> for (i in 2:1000) { 
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x[i] <- 0.4 * x[i - 1] + 0.3 * y[i - 1] + wx[i] 
yli] <- 0.2 * x[i - 1] + 0.1 * y[i - 1] + wy[i] 
} 
> xy.ar <- ar(cbind(x, y)) 
> xy.ar$ar[, , ] 


x y 
x 0.399 0.321 
y 0.208 0.104 


As expected, the parameter estimates are close to the underlying model val- 
ues. If the simulation is repeated many times with different realisations of 
the bivariate white noise, the sampling distribution of the estimators of the 
parameters in the model can be approximated by the histograms of the esti- 
mates together with the correlations between estimates. This is the principle 
used to construct bootstrap confidence intervals for model parameters when 
they have been estimated from time series. 

'The bootstrap simulation is set up using point estimates of the parameters 
in the model, including the variance of the white noise terms. Then time series 
of the same length as the historical records are simulated and the parameters 
estimated. A (1— o) x10096 confidence interval for a parameter is between the 
lower and upper o/2 quantiles of the empirical sampling distribution of its 
estimates. 


11.6.1 VAR model fitted to US economic series 


A quarterly US economic series (1954-1987) is available within the tseries 
library. A best-fitting VAR. model is fitted to the (mean-adjusted) gross na- 
tional product (GNP) and real money (M1) in the following example.! Ordi- 
nary least squares is used to fit the model to the mean adjusted series — with 
dmean set to TRUE and intercept set to FALSE since the latter parameter will 
not be required. 


library(tseries) 

data(USeconomic) 

US.ar <- ar(cbind(GNP, M1), method-"ols", dmean-T, intercept-F) 
US.ar$ar 


V VM M 


Q 
= 
ae) 


GNP Mi 
1 1.27181 -0.0338 
2 -0.00423 0.0635 
3 -0.26715 -0.0286 


> 9 M1 


! Real money means income adjusted by inflation. 


GNP 
1 1.167 
2 -0.694 
3 -0.510 
> acf (US. 


> acf (US. 
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M1 
1.588 
-0.484 
-0.129 


ar$res[-c(1:3), 1]) 
ar$res[-c(1:3), 2]) 


From the code above, we see that the best-fitting VAR model is of order 3. 
The correlogram of the residual series indicates that the residuals are ap- 
proximately bivariate white noise, thus validating the assumptions for a VAR 
model (Fig. 11.6). 
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Fig. 11.6. Residual correlograms for the VAR(3) model fitted to the US economic 
series: (a) residuals for GNP; (b) residuals for M1. 


To check for stationarity, the characteristic function can be evaluated us- 
ing the determinant: 


01 


( 3 7 Ge po y i e 0.06354 ) 2 


1.167 1.588 —0.6942 —0.4839 
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_ { —0.2672 —0.02859 ae 
—0.5103 —0.1295 


= 1 — 2.8592 + 2.5472? — 0.32322? — 0.526524 + 0.14242° + 0.019992° 


From this it can be verified that the fitted VAR(3) model is stationary since 
all the roots exceed unity in absolute value: 


> Mod( polyroot(c(1,-2.859,2.547,-0.3232, -0.5265, 0.1424, 0.01999)) ) 
[1] 1.025269 1.025269 1.257038 1.598381 2.482308 9.541736 


At the time of writing, an algorithm was not available for extracting stan- 
dard errors of VAR parameter estimates from an ar object. Estimates of these 
errors could be obtained using a bootstrap method or a function from another 
library. In the vars package (Pfaff, 2008), available on the R website, the VAR 
function can be used to estimate standard errors of fitted VAR parameters. 
Hence, this package was downloaded and installed and is used to extract the 
standard errors in the code below. Those estimates that are not significantly 
different from zero are removed before making a prediction for the following 
year. The vars package can also allow for any trends in the data, so we also 
include a trend term for the GNP series since US GNP will tend to increase 
with time due to an expanding population and increased productivity. 


> library (vars) 
> US.var <- VAR(cbind(GNP, M1), p = 3, type = "trend") 
> coef (US.var) 


$GNP 

Estimate Std. Error t value Pr(»|t|) 
GNP.11 1.07537 0.0884 12.1607 5.48e-23 
M1.11 1.03615 0.4103 2.5254 1.28e-02 
GNP.12 -0.00678 0.1328 -0.0511 9.59e-01 
M1.12 -0.30038 0.7543 -0.3982 6.91e-01 
GNP.13 -0.12724 0.0851 -1.4954 1.37e-01 
M1.13 -0.56370 0.4457 -1.2648 2.08e-01 
trend 1.03503 0.4352 2.3783 1.89e-02 


$M1 
Estimate Std. Error t value Pr(»|t|) 


GNP.11  -0.0439 0.0191 -2.298 2.32e-02 
M1.11 1.5923 0.0887 17.961 1.51e-36 
GNP.12 0.0616 0.0287 2.148 3.36e-02 
M1.12 -0.4891 0.1630 -3.001 3.25e-03 
GNP.13 -0.0175 0.0184 -0.954 3.42e-01 
Mi.13 -0.1041 0.0963 -1.081 2.82e-01 
trend 0.0116 0.0940 0.123 9.02e-01 


> US.var «- VAR(cbind(GNP, M1), p = 2, type = "trend") 
> coef(US.var) 


$GNP 


GNP.11 1.141 


M1.11 1.330 
GNP .12 -0.200 
M1.12 -1.157 
trend 1.032 
$M1 

Estimate 


GNP.11 -0.03372 
M1.11 1.64898 
GNP.12 0.03419 
M1.12 -0.65016 
trend 0.00654 


> acf(resid(US. 
> acf(resid(US. 


ACF 


1.0 


ACF 
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Estimate Std. Error t value Pr(»|t|) 
0.0845 13.51 1.83e-26 
0.3391 3.92 1.41e-04 
0.0823 -2.43 1.67e-02 
0.3488 -3.32 1.19e-03 
0.4230 2.44 1.61e-02 
Std. Error t value Pr(>|t]) 
0.0181 -1.8623 6.48e-02 
0.0727 22.6877 7.33e-47 
0.0176 1.9384 5.48e-02 
0.0748 -8.6978 1.35e-14 
0.0906 0.0722 9.43e-01 
var)[, 1]) 
var)[, 2]) 
= T RE 1 
i ej ns 
0 5 10 15 20 
Lag 
(a) 
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Fig. 11.7. Residual correlograms for the VAR(2) model fitted to the US economic 
series: (a) residuals for GNP; (b) residuals for M1. 
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Below we give the predicted values for the next year of the series, which are 
then added to a time series plot for each variable (Fig. 11.8). 


fo} 

So y 

e 

"b 

Q 

O | 

ice] 

e 

Q 

e .J 

N 

po T T T T 

1982 1984 1986 1988 
Time 
(a) 

o 

in ul 

1 

o 

2 

+ T T T T 

1982 1984 1986 1988 
Time 


(b) 


Fig. 11.8. US economic series: (a) time plot for GNP (from 1981) with added 
predicted values (dotted) for the next year; (b) time plot for M1 (from 1981) with 
added predicted values (dotted) for the next year. 


> US.pred <- predict(US.var, n.ahead = 4) 
> US.pred 


$GNP 

fcst lower upper CI 
[1,] 3958 3911 4004 46.2 
[2,] 3986 3914 4059 72.6 
[3,] 4014 3921 4107 93.0 
[4,] 4043 3933 4153 109.9 


$M1 

fcst lower upper CI 
[1,] 631 621 641 9.9 
[2,] 632 613 651 19.0 
[3,] 634 606 661 27.5 
[4,] 636 601 671 35.1 
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> GNP.pred <- ts(US.pred$fcst$GNP[, 1], st = 1988, fr = 4) 
> Mi.pred <- ts(US.pred$fcst$M1[, 1], st = 1988, fr = 4) 


> ts.plot(cbind(window(M1, start = 1981), Mi.pred), lty 


1:2) 


> ts.plot(cbind(window(GNP, start = 1981), GNP.pred), lty = 
= 1:2) 


11.7 Summary of R commands 


adf.test  Dickey-Fuller test for unit roots 

pp.test  Phillips-Perron test for unit roots 

rmvnorm multivariate white noise simulation 

po.test  Phillips-Ouliaris cointegration test 

ar Fits the VAR model based on the smallest AIC 

VAR Fits the VAR model based on least squares 
(vars package required) 


11.8 Exercises 


1. Show that if a VAR(1) process driven by white noise with mean 0, as 


defined in Equation 11.5, is stationary, then it has a mean of 0. Deduce 
that if a VAR(p) process driven by white noise with mean 0 is stationary, 
then it has a mean of 0. [Hint: Take expected values of both sides of 
Equation 11.5 and explain why the inverse of I — O exists.] 


. For what values of a is the model below stationary? 


£i = 0.9211 + agica + Wa 


Ye = azi-1 + 0.9yica + Wy,t 


. This question uses the data in stockmarket.dat, which contains stock 
market data for seven cities for the period January 6, 1986 to December 
31, 1997. Download the data via the book website and put the data into 
a variable in R. 

a) Use an appropriate statistical test to test whether the London and/or 
the New York series have unit roots. Does the evidence from the sta- 
tistical tests suggest the series are stationary or non-stationary? 

b) Let {x,} represent the London series (Lond) and (y,) the New York 
series (NY). Fit the following VAR(1) model, giving a summary output 
containing the fitted parameters and any appropriate statistical tests: 


Ly = ag + a124—1 + GQYt-1 + Ust 


yr = bo + by x41 + b2Yt—1 + wy 
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Which series influences the other the most? Why might this happen? 
Test the London and New York series for cointegration. 

Fit the model below, giving a summary of the model parameters and 
any appropriate statistical tests. 


Tt = ao + ayt + Wt 


Test the residual series for the previous fitted model for unit roots. 
Does this support or contradict the result in part (d)? Explain your 
answer. 


Using the VAR function in the vars package, fit a multivariate VAR 
model to the four economic variables in the Canadian data (which 
can be loaded from within the vars package with the command 
data (Canada) ). 

Using the fitted VAR model, make predictions for the next year. Add 
these predictions to a time series plot of each variable. 


Fit an ARIMA(1, 1, 0)(1, 1, 1)12 model to the logarithm of the elec- 
tricity production series. Verify that the residuals are approximately 
white noise. 

Fit the same model as in (a) to the logarithm of the chocolate produc- 
tion series. Again, verify that the residuals are approximately white 
noise. 

Plot the cross-correlogram of the residuals of the two fitted ARIMA 
models, and verify that the lag 0 correlation is significantly different 
from zero. Give a possible reason why this may happen. 

Forecast values for the next month for each series, and add a simulated 
bivariate white noise term to each forecast. This gives one possible 
realisation. Repeat the process ten times to give ten possible future 
scenarios for the next month’s production for each series. 


12 


State Space Models 


12.1 Purpose 


The state space formulation for time series models is quite general and encom- 
passes most of the models we have considered so far. However, it is usually 
simpler to use the specific time series models we have already introduced when 
they are appropriate for the physical situation. Here, we shall focus on ap- 
plications for which we require parameters to adapt over time, and to do so 
more quickly than in a Holt-Winters model. The recent turmoil on the world’s 
stock exchanges! is a dramatic reminder that time series are subject to sud- 
den changes. Another desirable feature of state space models is that they can 
incorporate time series of predictor variables in a straightforward manner. 
Control engineers have used a state space representation of physical sys- 
tems as input, state, and output variables related by first-order linear differen- 
tial equations since the 1950s, and Kalman and Bucy published their famous 
paper on filtering in 1961 (Kalman and Bucy, 1961). Plackett (1950) published 
related, but less general, work on the adaptive estimation of coefficients in re- 
gression models and gave some historical background to the problem. In the 
control context, the state variables define the dynamics of some physical sys- 
tem and might, for instance, be displacements and velocities. Typically, only 
some of these state variables can be measured directly, and these measure- 
ments are subject to noise. The objective of the Kalman filter is to infer values 
for all the state variables from the noisy measurements. The estimated values 
of the state variables are then used to control the system. Feedback control 
systems are the essence of robotics, and some applications are cruise control 
in automobiles, autopilots in aircraft, and the planetary explorer Rover So- 
journer — the Mars Pathfinder Mission was launched on the December 4, 1996. 


! Notable financial events in 2008 included the US government takeover of Fannie 
Mae and Freddie Mac on September 7, the rejection of the first bailout bill by 
the US House of Representatives on September 29, and the passing of the US 
Emergency Economic Stabilization Act of 2008 on October 3. 


P.S.P. Cowpertwait and A.V. Metcalfe, Introductory Time Series with R, 229 
Use R, DOI 10.1007 /978-0-387-88698-5_12, 
© Springer Science+Business Media, LLC 2009 
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The chemical process industry provides many other applications for control en- 
gineers. Typically, states will be concentrations, temperatures, and pressures, 
and the controller will actuate burners, stirrers, and pumps. Digital computers 
are an essential feature of modern control systems, and discrete-time models 
tend to be used in place of continuous-time models, with differences replacing 
derivatives and time series replacing continuous (analog) signals. 

In this chapter, we focus on economic time series. Usually, the states will 
be unknown coefficients in the linear models and the equations that represent 
changes in states will be rather simple. Nevertheless, the concept of such 
parameters changing rather than being fixed is a departure from most of 
the models we have considered so far, the exception being the Holt-Winters 
forecasting method. A Bayesian approach is ideal for the development of a 
state space model. 


12.2 Linear state space models 


12.2.1 Dynamic linear model 


We adopt the notation used in Pole et al. (1994), who refer to state space 
models as dynamic linear models. The values of the state at time t are repre- 
sented by a column matrix 0, and are a linear combination of the values of the 
state at time t — 1 and random variation (system noise) from a multivariate 
normal distribution. The linear combination of values of the state at time t— 1 
is defined by G+, and the variance-covariance matrix of the system noise is W;. 
The observation at time t is denoted by a column matrix y; that is a linear 
combination of the states, determined by a matrix F;, and random variation 
(measurement noise) from a normal distribution with variance-covariance ma- 
trix V,. The random variation has mean zero and is uncorrelated over time. 
All the matrices can be time varying, but in many applications G is constant. 
'The state space model is summarised by the equations 


Ut = F;0, + Ut 
0, = G401 4 + Wt (12.1) 


where 05 ~ N (mo, Co), v; ~ N(0, Vi), and wi ~ N(0,W.). 

A specific, but very useful, application of state space models is to generalise 
regression models so that the parameters can vary over time. For example, the 
sales manager in a house building company might use the following model to 
allow for the influence of a general level (L) of sales in the sector and the 
company's own pricing (P) policy on the company’s house sales (S): 


Si = Li UP 
Li = Lia + AL 
Bi = Bia + AB; (12.2) 
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The first equation is a linear regression with price as the predictor variable. 
However, the model allows the intercept term, the level, and the coefficient 
of price to vary over time, and this makes it far more realistic for the house 
building market. The v, AL;, and AG; are random deviations with mean zero 
that are independent over time, although AL, and Af, can be correlated. The 
relative magnitudes of the variances of these components of error, which are 
the entries in the matrices V; and W;, determine the variability of the param- 
eters. If W, — 0, the state space model reduces to the standard regression 
with constant parameters. In state space form 


Bae ep) ela). Sen Sepa) 


The subscript t on the matrix G is redundant, as G is constant in this appli- 
cation. 

'The system is said to be observable if it is possible to infer the values of all 
the components of the state from the noisy observations (Exercise 5). If the 
system is observable, we can distinguish prediction, filtering, and smoothing. 
Prediction is the forecasting of future values of the state, filtering is making 
the best estimate of the current values of the state from the record of obser- 
vations, including the current observation, and smoothing is making the best 
estimates of past values of the state given the record of observations. Filtering 
is particularly important because it is the basis for control algorithms and 
forecasting. 


12.2.2 Filtering* 


Let D, represent the data up until time t. In most applications, the data are 
the time series of observations, but the notation does allow for the time series 
to be augmented by any additional information. In the following, we express 
the data up until time t, D, as the combination of data up until time t — 1 
and the observation at time t, (D 1, yz). Bayes's Theorem gives 


py | 0+) P(t | Dii) 
plyt) 


p(o: | Dia, yt) = (12.3) 


and is usually applied without the normalising constant, so that we can write 


p(O | Di a, yi) x plyt | 4) p(s | Dii) (12.4) 


That is, the posterior density of the state at time t, given data up until time 
t, is proportional to the product of the probability density of the observation 
at time t given the state at time t, referred to as the likelihood, and the prior 
density of the state at time t, given data up until time £ — 1. If the prior distri- 
bution and the likelihood are both normal, Bayes's Theorem provides a nice 
analytic form. In the univariate case, the mean of the posterior distribution 
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is a weighted mean of the mean of the prior distribution and the observation 
with weights proportional to their precisions.? Also, the precision of the pos- 
terior distribution is the sum of the precision of the prior distribution and the 
precision of the observation (Exercise 6). The extension of this result to the 
multivariate normal distribution leads to the result 


0, | D, ~ N (mı, Ci) (12.5) 
where m, and C, are calculated iteratively, for t from 1 up to n, from the 
following algorithm, which is known as the Kalman filter. Remember that mo 
and Co are specified as part of the model. 


Kalman filter 


The prior distribution for 0+, the likelihood, and the posterior distribution for 
0, are (respectively) 


0,| Dia ~ N (ai, R4) yi | 0, ~ N(FT0,, Vi) 0, | D, e N (m, Ci) 


Then, for t = 1,..., the algorithm is given by 


a, = Gym fi m Flay 

Ri = G1C,4Gi + Wi Qi = F/B,F,-V, 
et = yt fi A, = R:F.Q;' 
m = at + Ares Cy = Ri — AQ Ar’ 


In this algorithm, f, is the forecast value of the observation at time t, the 
forecast being made at time t — 1. It follows that e, is the forecast error. The 
posterior mean is a weighted sum of the prior mean and the forecast error. 
Notice that the variance of the posterior distribution, C;, is less than the 
variance of the prior distribution, R. 


12.2.3 Prediction* 


Predictions start from the posterior estimate of the state, obtained from the 
Kalman filter, on the day (t) on which the forecast is made. We can make 
one-step-ahead forecasts in the following manner: 


E [yi1 | Di] = E [FL 1881 + va | Dj] = FE [6.1] Di] = Frau 


= fea 
(12.6) 


Var [yii | Di] = Var [Fri 63 vea | Di] = Fra Var [6:41 | Di] Fea + Vea 
= FE Rep Fei + Vipa 
TT Qua 


(12.7) 


? Precision is the reciprocal of the variance. 
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The general formula for a forecast at time t for k steps ahead is 


Vk | Di ~ NOR apo Qua) (12.8) 
where 


fitk|t = E ag atga 
Qik |t = Figg Rete tPi+e + Ve (12.9) 
Rise |e = GF Rigi (GRY + 2524 G* 3W,i(GF-3)' 


12.2.4 Smoothing* 


'The optimal smoothing algorithm follows from a nice application of elemen- 
tary probability. We demonstrate this for one step back, and the general case 
proceeds in the same way. To begin with, we use the rule of total probability 


p(Oi |De) = f POr 16 Di) PO | Di) d; (12.10) 


The p(@;|D,) in the integrand on the right-hand side is available from the 
Kalman filter, so we only need to consider further 


P(O:—-1 | 4, Di) = p(0t—1 | 0t, Di—1) (12.11) 


because y, provides no further information once 0, is known. Now we can 
apply Bayes's Theorem: 


p(0.|0. 1, Di 1) p(06 3| Di 1) 
p(0, | Di-1) 


Finally, given 0, and D,..1, the denominator on the right-hand side is the 
normalising factor, the first term in the numerator on the right-hand side 
follows from the system equations, and the second term in the numerator is 
the posterior density at time t — 1, which follows from the Kalman filter. If 
we now assume the distributions are normal we obtain the result 


p(0i-1|04 Dii) = (12.12) 


0, 1| Di ~ N(az(—1), Rí(—1)) (12.13) 
where 
a;(—1) = m4 + By i(mi — ai) 
R4(—1) = Ci-1 — Bia(fü — C))BL 4 (12.14) 
B, = 0,4G'R;! 
You can find more details in Pole et al. (1994). 
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12.3 Fitting to simulated univariate time series 


12.3.1 Random walk plus noise model 


As a first example, consider a daily stock price taken at the close of each 
trading day. This is treated as independent normal random variation, with a 
standard deviation of 1 about a mean that is 20 for the first 10 time points 
but drops to 10 for time points 11 up to 20. In practice, we never know the 
underlying process, and models we fit are based on physical intuition and the 
goodness-of-fit to the data. State space models have the great advantage that 
the parameters can change over time and are able to allow for a change in 
mean level. We first implement a model for the stock price, Yt, 


Yt = hr v 
Br hay (12.15) 
where 09 ~ N(25, 10), ve ~ N(0,2), and w; ~ N(0,0.1), which allows for small 
changes in an underlying mean level 6;. The SS function in the sspir package 
(Dethlefsen and Lundbye-Christensen, 2006) creates a state space object.? 
'The syntax corresponds precisely to the notation of Equation 12.1 except for 
the additional phi parameter, which we do not use in this chapter and can be 
ignored. The function kfilter gives the Kalman filter estimate of the state 
at each time point, given the preceding observations and the observation at 
that time. The function smooth gives the retrospective estimate of the state 
at each time given the entire time series. 


library (sspir) 

set.seed(1) 

Plummet.dat <- 20 + 2*rnorm(20) + c(rep(0,10), rep(-10,10)) 

n <- length(Plummet.dat) 

Plummet.mat «- matrix(Plummet.dat, nrow - n, ncol = 1) 

mi «- SS(y = Plummet.mat, 
Fmat = function(tt,x,phi) return( matrix(1) ), 
Gmat function(tt,x,phi) return( matrix(1) ), 
Wmat = function(tt,x,phi) return( matrix(0.1)), 
Vmat = function(tt,x,phi) return( matrix(2) ), 
mO = matrix(25), CO = matrix(10) 
) 

plot(mi$y, ylab = "Closing price", main = "Simulated") 

mi.f <- kfilter(m1) 

mi.s <- smoother(m1.f) 

lines(mi.f$m, lty = 2) 

lines(mi.s$m, lty = 3) 
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In Figure 12.1, the Kalman filter rapidly settles around 20 because of the 
relatively high variance of 10 that we have attributed to our initial inaccurate 


3 You will need to download this package from CRAN. 


12.3 Fitting to simulated univariate time series 235 


20 
Il 


Closing price 
15 


10 


Time 


Fig. 12.1. Simulated stock closing prices: filtered estimate of mean level (dashed); 
smoothed estimate of mean level (dotted). 


estimate of 25 until time 10. However, the filter is slower to adapt to the step 
change. We can improve this performance, if we take advantage of the Bayesian 
formulation, which is ideally suited for incorporating our latest information. 
Given the drop in mean level at t = 11, it would be prudent to review our 
assessment of the relevance of the earlier time series to future values. Here, 
as an example, we decide to assign a variance of 10 to the evolution of 0; at 
t = 12. This effectively allows the filter to restart at t = 12, and in Figure 12.2 
you can see that the estimate of 0, rapidly settles about 10 after t — 12. For 
comparison, the filter without intervention is also shown in Figure 12.2. If 
you look back at Figure 12.1, you will see the smoothed estimate of 6;. In 
this case, it gives less accurate estimates of 0, than when t < 20 because the 
assumed model, without intervention, makes little allowance for a large step 
change. In any application, the filter and smoother must coincide for the latest 
observation. The latest filtered value is our best estimate of the mean level 
and, for this model, our best estimate of tomorrow's price. 

Another means of making the filter adapt more quickly to a change in level 
is to increase the variance of w;. The drawback is that the filter will then be 
unduly influenced by daily fluctuations in price if the level is constant. It is 
the ratio of the variance of w; to the variance of v, rather than the absolute 
values of the variances that determines the filter path (Exercise 1). However, 
limits of prediction do depend on the absolute values. 
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plot(mi$y, ylab = "Closing price", main = "Simulated") 
mi.f <- kfilter(m1) 

lines(mi.f$m, lty = 2) 

m2 <- mi 


Wmat(m2) <- function(tt, x, phi) 1 
if (tt == 12) return(matrix(10)) else return(matrix(0.1)) 
} 


m2.f <- kfilter (m2) 
lines (m2.f$m,1ty=4) 
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Fig. 12.2. Simulated stock closing prices: filter after intervention (dot-dash); origi- 
nal filter (dashed). 


In bull markets, stock prices tend to drift upwards. A drift can be incor- 


porated in the state space model by introducing an additional element in the 
state vector. You are asked to do this in Exercise 2. 


12.3.2 Regression model with time-varying coefficients 


The time series regression models that we considered in Chapter 5 are based 
on an assumption that the process is stationary and hence that the coefficients 
are constant. This assumption is particularly equivocal for recent environmen- 
tal and economic time series, even if the predictor variables do not explicitly 
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include time, and state space models are ideally suited to relaxing the restric- 
tion. To demonstrate the procedure, we will generate data from the equations 


Yt = at bay + zu 
x, =2+1t/10 


(12.16) 


where t = 1,...,30, z ~ N(0,1), a = 4 and b = 2 for t = 1,...,15, and 
a = 5 and b = —1 for t = 16,...,30. We fit a straight line with time-varying 
coefficients, which is the model recommended to the sales manager in the 
house building company of 812.2.1. The components of 0, are the intercept 
and slope at time t, and the estimates from the Kalman filter are shown in 
Figure 12.3. In this application, the matrix F, Fmat in the SS function, is 
time varying and is (1,z;)'. The matrix Gmat is the identity matrix, which 
we use diag to generate. The parameter variation, which is modelled with 
matrix Wmat, is small relative to the observation variance modelled by Vmat. 
'The initial guesses for the intercept and slope are 5 and 3, respectively, and 
the associated variance of 10 reflects the considerable uncertainty. 


> library(sspir) 

> set.seed(1) 

> x1 <- 1:30 

> x1 <- x1/10 + 2 

>a <- c(rep(4,15), rep(5,15)) 

> b <- c(rep(2,15), rep(-1,15)) 

> n <- length(x1) 

> yl <- a * b * x1 + rnorm(n) 

> x0 <- rep(1, n) 

> xx <- cbind(x0, x1) 

> x.mat <- matrix(xx, nrow = n, ncol = 2) 
> y.mat <- matrix(y1, nrow = n, ncol = 1) 


> m1 <- SS(y = y.mat, x = x.mat, 
Fmat = function(tt,x,phi) 
return( matrix(c(x[tt,1], x[tt,2]), nrow = 2, ncol = 1)), 
Gmat = function(tt,x,phi) return (diag(2)), 
Wmat = function(tt, x, phi) return (0.1*diag(2)), 
Vmat = function(tt,x,phi) return (matrix(1)), 
m0 = matrix(c(5,3) ,nrow=1,ncol=2) ,CO=10*diag(2) 
) 


mi.f <- kfilter(m1) 

par (mfcol=c(2,1)) 
plot(m1.f$m[,1], type='1') 
plot(mi.f$m[,2], type='1') 
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'The estimates from the Kalman filter rapidly approach the known values, 
even after the step change (Fig. 12.3). The estimated standard errors of the 
estimates from the filter are based on the specified values in the variance 
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matrices. rather than being estimated from the data (§12.7). Pole et al. (1994) 
give further details of state space models with estimated variances. 
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Fig. 12.3. Realisation of a regression model in which the intercept and slope change. 
Kalman filter estimates of (a) intercept and (b) slope. 


12.4 Fitting to univariate time series 


Morgan Stanley share prices for trading days from Monday, August 18, 2008, 
until Friday, November 14, 2008, are in the online file MorgStan.dat. If we 
wish to set up a random walk plus drift model, we need an estimate of the 
two variance components. One way of doing this is to compare the variances 
within and between weeks, the former being taken as V and the latter as 
W (Fig. 12.4). Late 2008 was a tumultuous period for bank shares and the 
variances within and between weeks are estimated as 5.4 and 106, respectively 
(Exercise 3). With these parameter values, both the filtered and smoothed 
values are very close to the observed data. The estimated price of shares on 
Monday, November 17, is given by the latest filtered value m1.f$m[n] and 
equals 12.08, which is close to the 12.03 closing price on Friday, November 
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14. If the variances within and between weeks are estimated from the first 
four weeks, when the market was relatively stable, they are 2.1 and 1.0. The 
estimated price of shares on Monday, November 17, is now 12.69. 


€ o] 

o t+ 

oS 3] 

2 p | 

D N 

& -~ 

[7] 4 

9 o | 

Qui mon | 

0 10 20 30 40 50 60 
Time (trading days) 
(a) 

€ o] 

o t+ 

oS 7 

2 as | 

D N 

& =] 

[7] uM 

9 o | 

O = 


Time (trading days) 
(b) 


Fig. 12.4. Morgan Stanley close of business share prices for trading days August 
18 until November 14, 2008. Kalman filter and smoothed values: (a) V = 5.4 and 
W = 106; (b) V = 2.2 and W = 1.0. 


12.5 Bivariate time series — river salinity 


'The Murray River supplies about half of South Australia's urban water needs 
and, in dry years, this can increase to as much as ninety percent. Other sources 
of water in South Australia are bore water and recycled water, although both 
tend to have high salinity. The World Health Organisation (WHO) recom- 
mendation for the upper limit of salinity of potable water is 500 mg/l (ap- 
proximately 800 EC), but the domestic grey water system and some industrial 
and irrigation users can tolerate higher levels of salinity. The low rainfall and 
increasing population makes the efficient use of water resources a priority, 
and there are water-blending schemes that aim to maximise the use of recy- 
cled water. The average monthly salinity, measured by electrical conductivity 
(EC; microSiemens per centimetre at 25°C) and flow (Gigalitres per month) at 
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Chowilla on the Murray River, have been calculated from data provided by the 
Government of South Australia and are available in the file Murray.txt. Pre- 
dictions of salinity are needed for the recycled water schemes to be operated 
efficiently, and the changing level of salinity requires an adaptive forecasting 
strategy, which is well supported by state space methods. 

We have 81 months of flows and salt concentrations at Chowilla and have 
been asked to set up a state space model that can adapt to the changing 
salt and flow levels that are a feature of the Murray River. Let S, and L, be 
the mean adjusted salt concentration and river flow for month t, respectively. 
The mean adjustment generally improves numerical stability and has as a 
convenient consequence that the estimates of the intercept terms and estimates 
of other coefficients will be approximately uncorrelated. A preliminary time 
series analysis found that the model 


Si = 0 + 055,1 + 6314-1 + 04 cos(2nt/12) + 05 sin(2z1/12) + US.t (12.17) 


Li = O6 + 0751.1 + Og Li—1 + 0o cos(27t/12) + 010 sin(2nt/12) + Urt (12.18) 


provides a good fit to the available data (Exercise 4). We now express this 
in state space form, which will allow for the coefficients 04,...,049 to change 


over time. 
S.N. (1 Sti Le-1 Cst sn; 0 0 0 0 0 
Li B 0 0 0 0 0 1 91.4 Lia CS SNE 


x (01,4 02,4 03,0 014 O5,t 05. Ort Ost Oot 0104). (12-19) 


0j, = 0i i1 d Wit (12.20) 


'The R code is now more succinct and makes use of diag to set up diagonal 
matrices, but the general principles are the same as for any regression model 
with time-varying coefficients. The diagonal elements of the matrix Vmat are 
the estimated variances of the errors from the preliminary regression models 
for S; and L, which are 839 and 1612, respectively. There is no evidence 
of autocorrelation in the residual series from the two regressions, but the 
cross-correlation at lag 0, which is —0.299, is statistically significant. The 
corresponding estimate of the covariance of the errors, the off-diagonal term 
in Vmat, is therefore —348. The matrix Wmat is set up to allow the mean levels 
to adapt and slight adaptation of the other coefficients. The choice of values 
for the variances is somewhat subjective. The mean salinity and mean flow 
over the 81-month period were 165 and 259, respectively, and a variance of 10 
corresponds to a standard deviation of roughly 2% of mean salinity and 1% of 
mean flow. The variances of 0.0001 correspond to a standard deviation of 0.01 
for the change in level of the other coefficients. The effect of changing entries 
in Wmat can be investigated when setting up the filter. T'he initial estimates in 
mO were set fairly close to the estimates from the preliminary regressions. The 
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uncertainty associated with these estimates was arbitrarily set at 100 times 
Wmat, as it does not have a critical effect on the performance of the filter. The 
results are shown in Figures 12.5 and 12.6 and seem reasonable. 


> library(sspir) 

> www <- 'http://www.massey.ac.nz/^pscowper/ts/Murray.txt' 
> Salt.dat <- read.table(www, header=T) ; attach(Salt.dat) 
>n <- 81 ; Time <- 1:n 

> SIN <- sin(2 * pi * Time /12)[-1] 

> COS <- cos(2 * pi * Time /12)[-1] 

> Chowilla «- Chowilla - mean(Chowilla) 

» Flow «- Flow - mean(Flow) 

> Chow <- Chowilla[-1] 

> Chow.L1 <- Chovilla[-n] 

» Flo «- Flow[-1] 

> Flo.Li <- Flow[-n] 

> Sal.mat <- matrix(c(Chow, Flo), nrow = 80, ncol = 2) 

» x0 «- rep(1, (n-1)) 

> xx <- cbind(x0, Chow.L1, Flo.Li, COS, SIN) 

> x.mat <- matrix(xx, nrow = n-1, ncol = 5) 

> G.mat <- diag(10) 

> W.mat <- diag(rep(c(10, 0.0001, 0.0001, 0.0001, 0.0001), 2)) 
> m1 <- SS(y = Sal.mat, x = x.mat, 


Fmat = 
function(tt, x, phi) return (matrix( 
c(x[tt,1], x[tt,2], x[tt,3], x[tt,4], x[tt,5], rep(0,10), 
x[tt,1], x[tt,2], x[tt,3], x[tt,4], x[tt,5]), 
nrow-10,ncol-2)), 
Gmat - function(tt, x, phi) return (G.mat), 
Wmat function(tt, x, phi) return (W.mat), 
Vmat function(tt, x, phi) return 
(matrix(c(839, -348, -348, 1612), nrow-2, ncol-2)), 

mO-matrix(c(0,0.9,0.1,-15,-10,0,0,0.7,30,20) , nrow-1,ncol-10), 
CO = 100 * W.mat 
) 


mi.f <- kfilter (m1) 

par (mfcol=c(2,3)) 
plot(mi.f$m[,1], type='1') 
plot(mi.f$m[,2], type='1') 
plot(mi.f$m[,3], type='1') 
plot(mi.f$m[,6], type='1') 
plot(mi.f$m[,7], type='1') 
plot(mi.f$m[,8], type='1') 


par (mfcol=c(2,2)) 
plot(mi.f$m[,4], type='1') 
plot(mi.f$m[,5], type='1') 
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> plot(m1.f$m[,9], type='1') 
> plot(m1.f$m[,10], type='1') 
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Fig. 12.5. Kalman filter estimates of parameters for the Murray River salt and 
flow model: (a) deviation of salt from mean; (b) autoregressive coefficient for salt; 
(c) cross regressive coefficient for flow; (d) deviation of flow from mean; (e) cross 
regressive coefficient for salt; (f) autoregressive coefficient for flow. 


12.6 Estimating the variance matrices 


In the applications considered in this chapter, we have had to specify values for 
the V and W matrices when setting up the state space model. We can adapt 
the algorithm to update these values as we obtain more data. The formula for 
the variance of one-step-ahead forecast errors depends on the known values of 
entries in the V and W matrices. This theoretical variance can be compared 
with the variance of the actual forecast errors up until time t. Define 


actual variance of forecast errors up to time t 


$i = : - 
theoretical variance of forecast errors 


Then updated matrices for V and W are obtained from ¢,V and W, re- 
spectively. This strategy will not adjust the relative variances of measurement 
noise and system noise, but it does allow the absolute values to be updated 
(Exercise 7). 


12.7 Discussion 243 


e 
o 
J & | 
i F 
g 4 
2 
= oa 
S! S 8| 
o 
| es | 
S | 8 
ir «T T T T T ST T T T T 
I 
0 20 40 60 80 ! o 20 40 60 80 
Time (months) Time (months) 
(a) 
e 
eo £l 
o E 
8 8 
co | z - 
vo o D 
g 4 
N | Ww 
o ice) 
Oo 4 o 4 
o o 
N T 
T T T T T T T T T T 
0 20 40 60 80 0 20 40 60 80 
Time (months) Time (months) 


(c) 


Fig. 12.6. Kalman filter estimates of coefficients of seasonal components for Murray 
River salt and flow model: (a) cosine for salt; (b) sine for salt; (c) cosine for flow; 
(d) sine for flow. 


12.7 Discussion 


One of the main advantages of state space models is that they are adaptive, 
and the benefits of this are usually realised by implementing them in real 
time. We have only covered relatively straightforward examples, and there 
are many useful extensions. In particular, there are sophisticated methods 
for estimating the variances rather than specifying them and methods for 
estimating parameters in the F and G matrices as well as the states when 
this is theoretically possible. The distinction between states and unknown 
parameters depends on the application (see Exercise 5). 

'The Kalman filter applies to linear systems, but it can also be used as a 
local linear approximation to a non-linear system. This important develop- 
ment is known as the extended Kalman filter. The optimality of the standard 
Kalman filter rests on an assumption that the noise is Gaussian (normal) 
and independent, but similar optimal filters have been developed with other 
assumptions about noise distributions. 
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8 Summary of additional commands used 


ss Sets up state space model 
kfilter Runs the Kalman filter 
smooth Creates smoothed estimates of past values 


9 Exercises 


Refer to the R script to simulate stock price in $12.3.1. 

a) Change the variance of w; from 0.1 to 10, and comment on the change 
in the filter and smoother paths. 

b) Change the variance of w; from 0.1 to 10 and the variance of v, from 2 
to 200, and comment on the change in the filter and smoother paths. 


Suggest a means of incorporating a drift in the random walk plus noise 
model by introducing a second element in the state 04. Apply your model 
to the Morgan Stanley share price series. 


'The 64 Morgan Stanley share prices are from trading days over 13 weeks, 

Monday, September 1, being the Labor Day holiday. 

a) Calculate the variance for each of the 13 weeks, all but the third 
consisting of five trading days. This is the estimate of the within-week 
variance S24... 

b) Calculate the mean for each of the weeks and calculate the variance 
of the 13 means. Denote this by 52. 

c) Estimate the variance between weeks as 

2 2 lg 
DL eevean - Ss EE 5 D within 
The slight inaccuracy that results from one week having only four 
trading days is negligible. 

d) If you are familiar with the analysis of variance, use aov to obtain 
estimates of the within-week variances that do allow for the Labor 
Day holiday. 


Calculate the preliminary regressions for the Murray River salinity exam- 
ple, and verify the numbers given in the text. 


In many control applications, the matrices F; and G; are constant. The 
first issue is whether or not all the values of the state at time ¢ can be 
inferred from the observations at time t. If they can be inferred, the system 
is said to be observable. The linear system in Equation (12.1) is observable 
if the observability matrix, O, defined by 
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O=(F'F'G... F'G?!) 


has full rank, p. Consider a state space model for mean adjusted salinity 
(S+) and flow (L+) in which the coefficients are assumed known and the 
salinity and flow are the components of the state. Suppose only flow is 
measured, and assume the model has the form 


Li = (0 1) (S, Ly + Ut 


(5) = (24) (22) + (aes) 


Can the salinity at time t be inferred from the flow measurement? If so, 
what are the conditions on a, b, c, and d? (In time series applications, F; 
will generally contain the values of the predictor variables at time t. The 
observability requirement at time t is that the t x 1 columns of predictors 
in the linear regression model be linearly independent.) 


. Suppose that an observation y has a normal distribution with mean 60 
and variance ¢ and that the prior distribution for 0 is normal with mean 
Oo and variance ġo. We require the posterior distribution of 0 given the 
observation y. From Bayes's Theorem, 


P(A |y) x p(9) p(y |) 
0—8 2 ra 2 
aE 
260 2¢ 
It is now convenient to anticipate the result, that the mean of the posterior 
distribution (04) is a weighted mean of the mean of the prior distribution 


and the observation with weights proportional to the precision, and define 
01 and $4 as 


x exp | 


bo" $71 

0 = 

i gn Galen 
du = do +97! 


Now use these expressions to replace ĝo and $o in the expression for p(6 | y) 
by 61 and à: 
8? + A 
261 
This is proportional to a normal distribution with mean 0, and variance 
$1 since 02/6, is a constant with respect to the prior distribution. So, 


ur. | 6-a] 
V27101 E 261 


The extension of this result to the multivariate normal distribution can 
be used to derive the Kalman filter. 


»(8 |y) « exp |- 


p(0, y) a 
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7. Verify that the following script is a Kalman filter. Compare its perfor- 
mance with kfilter on the simulated regression example. Adapt the filter 
to update the variance and investigate its performance. 


set.seed(1) 

x1 <- c(1:30) 

x1 <- x1/10 + 2 

a<- 4 

b <- 2 

n <- length(x1) 

yi <- a * b * x1 + 0.1 * rnorm(n) 
x0 <- rep(1, n) 

xx <- cbind(x0, x1) 


o 


F «- matrix(xx, nrow = n,ncol-2) 

y <- matrix(yl, nrow = n,ncol-1) 

G <- matrix(c(1,0,0,1), nrow = 2, ncol = 2) 
W «- matrix(c(1,0,0,1), nrow = 2, ncol = 2) 
V 


«- matrix(1) 
mO <- matrix(c(5,1.5), nrow = 2, ncol = 1) 
CO <- matrix(c(.1,0,0,.1), nrow = 2, ncol = 2) 
a <- 0;R <- 0;f <- 0;Q <- O;e <- 0;A <- O;m <- 0;C <- O;tt <- 0; 
Kfilt.m <- cbind(rep(0, n), rep(0, n)) 
m <- m0 
C <- CO 
for (tt in 1:n) { 
Fmat «- matrix(c(F[tt,1],F[tt,2]), nrow = 2, ncol = 1) 
<- G 4*4 m 
<- G A*A C A*h t(G) + W 
<- t(Fmat) 4*4 a 
t(Fmat) %*% R 4*, Fmat + V 
«- y[tt]-f 
<- R 4*4 Fmat 4*4 solve(Q) 
<- a + A «e 
<- R - A A*A% QW tO) 
Kfilt.m[tt,1] <- m[1,1] 
Kfilt.m[tt,2] <- m[2,1] 
} 
> plot(Kfilt.m[1:n, 1]) 
> plot(Kfilt.m[1:n, 2]) 
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